The Burning Question is a column that tackles some of the biggest questions in the intersection of science, technology, geopolitics and culture, shaping our world as we know it. The column will soon be expanded into a newsletter, and you can subscribe for updates here. Write to editor@theweek.in with comments, suggestions and questions.   

In mid-19th century, when the Cholera pandemic affected Britain, a common perception was that the disease was transmitted and spread by a ‘bad air’ or ‘bad smells’ from rotting organic matter. That was challenged by an anaesthesiologist John Snow, who, after mapping deaths from the disease, noticed clusters among families who depended on a specific public water pump on Broad Street. As it turns out, the water for the pump was polluted by sewage contaminated with cholera from a nearby cesspit—a common phenomenon under most homes at a time when proper sewer systems were a premium. Since Snow’s intervention, which 'flattened the curve' of Cholera, epidemiological and mathematical modelling of pandemics have come a long way.   

What are some of the different models?

There are statistical simulations like the Gaussian distribution curve. Here, an exponential surge in infections at the start of the pandemic will be followed by a peak, and then a period of decay when state intervention, (acquired/induced) immunity, and other factors, all start playing their roles. There are three main parameters in this specific model—X represents the peak of the infections, Y marks the time (chronological representation) of the peak, and Z signifies the duration of the pandemic.

coronavirus-pandemic
Ralph Beckett, via Wikimedia Commons

Gaussian curves by-and-large fit the virus outbreak models across the globe, at least in early stages before variants started appearing. The limitations of such models become apparent when we take into account its predictive capacity—when it comes to foretelling the trajectory of the pandemic, so that policies can be crafted around those approximations, the Gaussian model falls short.

sir-model
SIR model | via SUTRA paper

However, there are numerous other, more sophisticated simulations. One of the most popular in the mechanistic category is the Cormack-McKendrick SIR model, evolved around the time the Spanish flu struck—from 1918 to 1920. SIR divides the population into three categories. First, there is the susceptible population (those at risk), denoted as S(t) at any point of time t. There is the infected populace (I), who are carriers of the disease and can infect the susceptible, and the removed (R), who have developed immunity to the disease (by whatever means) or died.

seir-model
SEIR model | via SUTRA paper

Two major parameters are involved in this model—β (beta) and γ (gamma)—which decide the rate at which the susceptible population decays into infected, and further, removed categories. β is the probability of fresh infection arising out of contact between an infected and a susceptible person, and γ decides the rate at which infected persons move to the removed category.

sair-model
SAIR model | via SUTRA paper

As further epidemics arose, the SIR model was revised. When Malaria epidemic struck, an exposed (E) category was added to the model—resulting in SEIR—to account for the gestation period (where the person was infected), but not passing the infection to others.

In further years, models like SAIR were developed, to account for possible asymptomatic spreaders. The mechanistic models then evolved into structured metapopulation models, which captured how different demographics mix across populations, and further into agent-based network models.

In the Indian context, the most talked-about model is SUTRA, a purely data-centric approach to predicting the trajectory of the pandemic, developed by IIT Kanpur, and backed by the Union government’s Department of Science and Technology (DST). Says Manindra Agrawal, deputy director of IIT Kanpur and co-founder of the model: “To identify the spread of the disease, epidemiologists would use something like the Google mobility data and population density of the region, combined with a general understanding of how people interact with each other. They will combine all that knowledge and come up with an estimate of the value of β. However, that is totally extraneous to what the data is saying. Our approach has been to simply allow the data, or use it, to infer the value of these parameters.”

SUTRA doesn’t take into account the human factor, rise in variants, virus behaviour, or anything of that nature. “We just look for what the data tells us. The data told us that the contact rate came down significantly during lockdown. It was data that told us that contact rate went through the roof in March,” said Agrawal.

So, what was the SUTRA model? As the research paper notes, while the SAIR model was the first one to make a clear distinction between asymptomatic and symptomatic patients, it does make one unrealistic assumption: All symptomatic persons get detected. “The logic is that persons with symptoms would present themselves to the health authorities, while asymptomatic persons would not. Over time, some asymptomatic patients would develop symptoms, at which time they too would present themselves to the health authorities. However, this is not how matters have evolved during the COVID-19 pandemic. Instead of the A and I groups, it is more realistic to have groups U for Undetected but infected, and T for Tested Positive,” according to the paper.

Thus, SUTRA was developed, with letters S, U, T, R1, and R2—R1 and R2 denote the group of removed cases from U and T respectively. Here, the susceptible individual (S) become infected, all infected remain undetected (U) for varying periods of time; the undetected then transition into either the asymptomatic removed (R1) category or tested positive (T) and then removed (R2) category.

Why the need for SUTRA? The novel coronavirus, SARS-CoV-2, has many specialities. The first is the large number of asymptomatic cases—according to certain statistics, almost 80-85 per cent cases don’t show any symptoms. Then there is the question of unprecedented state intervention measures, like lockdowns, travel bans and hitherto unparalleled scale of vaccine deployment. 

This necessitated extra parameters—six, when compared to three in earlier models. Apart from the standard β and γ, there is ε (epsilon, the ratio of detected to total infected), ρ (rho, or what fraction of the population was affected by the pandemic), and others. If a lockdown is imposed, β changes. If the testing value shifts, ε changes. Once the spread increases, ρ changes.

Changes in these parameters result in phase shifts in the pandemic. The study divides the entire timeline of the pandemic into phases, such that within each phase, the parameters are almost constant. “A phase change occurs when one of the parameter values changes significantly. It could be due to a quick change for reasons listed above, or accumulated slow change over an extended period,” according to the study. As phase changes, parameter values drift for a period of time before stabilising—simulations during the drift period can result in erroneous assessments. The rest of duration of the phase is called the stable period—the model can predict the future course of the pandemic as long as the parameter values do not change significantly.

So, how effective has SUTRA been in its predictions? According to the study, SUTRA, on April 29, predicted that the cases would hit the peak between May 4 and 8, peaking at around 390K cases—a very good match; In the month of July, for the US, it indicated a peak in August-end at around 152K infections per day—the actual peak was on September 1 at 166K infections. “The graph equation holds for 62 per cent of the time. The rest is the drift period,” says Agrawal.  

In SUTRA, the possibility of underreporting/missed data was automatically factored in—the model worked on available data. One of the most interesting findings, says Agrawal, was the presence of a very strong correlation between available data and the actual infections. “It was almost as if what was reported was a scaled down, structured version of reality”.

Excerpts from an edited interview with Agrawal, on SUTRA and some of the criticisms facing the model, and how different states have performed during the pandemic:

1. What are the different ways in which mathematical modelling can help us during a pandemic?

Mathematical models, when we talk about pandemic progression, are typically a set of differential equations which try to estimate the trajectory of the disease outbreak. It tries to predict how everyday cases are going to be coming up, how numbers will rise, and so on. That is captured through differential equations. The first modelling was done nearly a hundred years ago, in the time of the Spanish flu [the Kermack-McKendrick SIR model]. Since then, it has become quite popular, and there are multiple variations. But, at the heart, it is the same SIR model which is used to predict the trajectory. 

2. What does SUTRA tell us, and what does it not? 

Take the case of the parameter beta. What our model will show is that beta has changed. Identifying the underlying cause behind that is something our model is unable to do. To understand that, one has to look to the ground and see what has shifted, and then correlate one to the other. Humans have to do that. What our model can do is flag the change in particular parameters.

3. The efficacy of lockdowns is currently one of the most hotly debated topics related to the pandemic. What does SUTRA data from India say about that?

We did a state-wide analysis of lockdowns during the second wave. In the first wave, there was uniform lockdown across the country, so there was nothing much available to compare. In the second wave, different states adopted different levels of lockdown. Our conclusion from SUTRA analysis is that a very strict lockdown is not substantially better than a medium-level lockdown. [In the latter] People can go to offices and other different places, even when it does not allow indoor crowding activities and such. Lockdowns of all kinds reduce beta. That is a given. What we found though was that the reduction in beta achieved by a medium type of lockdown was as good as the ones achieved by a strict lockdown [where except for emergency services, nothing much is permitted].

4. One of the criticisms against SUTRA was that it failed to consider some of the larger population dynamics, how different demographics interacted with each other, and the question of the Infection Fatality Rate [the proportion of deaths among infected].

Having too many parameters introduces more error into the system. Inferring those parameter values becomes too challenging. [The point was that] Different states followed different trajectories, for different reasons. Whichever path they took, they tried to do a good job. Some people criticise Kerala [because of high infection rate], as they also do for other states, but the decisions are taken in good faith. We can follow the pandemic trajectory of the different states, but what we learn from one state cannot be translated into another.

5. Which states, would you say, are the most interesting studies?

The most fascinating one has been Kerala, because of the very long tail of infections that they have been facing. The trade-off for them was that they reduced the peak, but at the cost of having a very long and extended tail. [Should they have gone by that plan?] There are pros and cons to all decisions. They are perhaps the only state to have adopted such a strategy; Maharashtra is perhaps another one, but it is somewhere in between, with a tail that is longish, but not as big as Kerala. Uttar Pradesh is another interesting state. UP and Kerala are both poles apart in their dealing with the pandemic, with UP having adopted a strategy of ‘chasing the pandemic’.

6. Could you elaborate on that a little?

Everybody knows about the Test Positivity Rate (TPR) [the percentage of coronavirus tests that turn out positive], and there is a general understanding that when TPR goes above 10 per cent, the pandemic is spreading very fast. But, can we infer something from the TPR about the testing strategy being used by a state? Not really. Looking at Test Positivity Rate doesn't tell us about testing strategies. So, what I came up with is a Normalised Test Positivity Rate [NTPR], which is TPR divided by the percentage of infected population at the time. If the ratio is close to 1, it means the testing strategy is random. If it is well above 1, the testing strategy is targeted. If NTPR is less than 1, you are testing a large number of people [not just symptomatic ones or susceptible ones], and ‘chasing the pandemic’. In fact, Kerala, in the first year of the pandemic [in 2020], had NTPR way below 1 because they were chasing the pandemic, and were able to control the spread very successfully. But, their strategy later changed from chasing the pandemic to targeted testing. The NTPR for Kerala is now at 3, or somewhere higher, and has been thereabouts for the past several months. So, it was possibly a conscious decision to no longer chase the pandemic, and instead focus on symptomatic cases. In UP, NTPR has been less than 0.5, right from the beginning of the pandemic to this day. That is very fascinating. I never expected it to be that case. The data suggests that the state has been chasing the pandemic all throughout. That was a very interesting learning.

7. Would you recommend change in strategy for any state with the data you have in hand? 

I am not saying that one strategy is better. I am just observing which state is following which strategy.

8. Your model factors in the question of incomplete/inaccurate infection data, with one of the most interesting findings being that the reported data mapped so well to the reality, almost in multiples. But, will this same phenomenon hold for foreign countries? If we were to try to export SUTRA to the global stage, maybe Latin America or Africa, would the same phenomenon [the strong relation between reported data and reality] hold there?

Absolutely. We have data from these countries, and the same phenomenon is observed there also. Now, the interesting point is, people keeping talking about the poor quality of infection data coming out of India; I invite their attention to the data coming out of countries like the US. There are such jerks in the curve that it is clearly a case of poor reporting. Indian data is so smooth. Sometimes, we assume that just because our systems are not as good as many other countries, we must be doing very badly. But, this is one place where I think we have done better than many other countries.

9. With a surge in Omicron variant cases being reported, what is your outlook?

We did a simulation for South Africa, and found something interesting. The value of beta in South Africa jumped from 0.5 to 1 in the month of August. What this data would strongly suggest, I would say, is that Omicron has been active there from September. And, it could be as twice as infectious as Delta. Because beta multiplied by a factor of 2. Also, this [raises many possibilities]. What if I had run a simulation for South Africa in September? We could have flagged that something weird was happening, with their beta making [an unprecedented] jump to 1. There is a lot of value I see in our kind of modelling that allows us to understand the behaviour of the pandemic, much before it actually shows in numbers. In September, coronavirus numbers in South Africa were still coming down. But, beta jumped.

10. What are your expectations for India?

We are continuously monitoring India, and there has been no jump so far. But there will be a jump. Our expectations are that it won't cause much damage, going by data we have from South Africa.

11. The coronavirus is a phenomenon unlike anything we have seen before. How do the large number of variants affect the core mathematical premises of SUTRA? For instance, does the first basic equation [S+U+T+R1+R2=1] hold when, because of the rise of multiple variants and the threat of immune escape and reinfections, one person can simultaneously be in the recovered [R] category and the susceptible [S] category?

What you have pointed out is true. This is analogous or indistinguishable from the situation where a person is in the recovered category, and a duplicate of this person exists in the susceptible category. Which is basically the same situation where the population has grown. In our model, this is captured by the fact that the reach parameter (ρ/rho) has expanded to include one more person. This makes reach [rho] go beyond one. That is how such a situation gets factored in. The model can capture that and this also tells us what fraction of people have lost immunity because of new mutants.

Disclaimer: Comments posted here are the sole responsibility of the user and do not reflect the views of THE WEEK. Obscene or offensive remarks against any person, religion, community or nation are punishable under IT rules and may invite legal action.