These desperate efforts ask parents to overcome nearly impossible obstacles. They must become experts in drug development, raise millions, and tirelessly cajole scientists. Few people can pull it off.
“There are a lot of people who know how to do gene therapy, but the knowledge is all fragmented, and so much can go wrong,” says Sanath Kumar Ramesh, a software developer whose son is afflicted by a different rare disease. Ramesh founded an organization, Open Treatments, that is building software families can use to organize gene-therapy research, including steps such as hiring scientists to create animal models of an illness.
“I think in the future, the distinction between scientists and parents is going to be blurred,” he says.
For parents whose kids have already been accepted into the Dayton trial, gene therapy may be their last chance. One of them is Meagan Rockwell, a nail technician in Cedar Rapids, Iowa, whose daughter, Tobin Grace, now three and half, was diagnosed with Canavan in 2018.
“They told us sorry, there is nothing we can do—no treatment, no cure—you will be lucky if she sees her fifth birthday. It was a hard blow, to know your only child has a life-limiting brain disease,” Rockwell says.
Rockwell says she found out about Leone’s gene-therapy effort online and eventually raised more than $250,000. “At the time, Tobin was the youngest person in the US with Canavan, and I think that played a huge factor in her acceptance,” she says, adding that Leone tells parents money puts them at the front of the line but doesn’t guarantee treatment.
Bateman-House, the bioethicist, says another risk is whether parents can really judge the benefits of an experimental procedure in a “dispassionate” way, especially if they have sunk a fortune into the effort. “It’s not only that their child is facing a dangerous condition; it’s that their blood, sweat, and tears is what is funding this intervention,” she says. “It could be incredibly difficult for a parent to change their mind and say ‘We are not going to do this.’“
Hope versus risk
The Dayton study currently has enough supplies of the genetic drug to treat only nine or 10 children. It was manufactured in Spain, but only after the researchers and families overcame what they call an ordeal of red tape, delays, and obstacles, some thrown up by government regulators who decide which genetic treatments can be tried and whether trials are properly planned.
At one point, in 2019, the Landsmans took their sons to the US Food and Drug Administration for a meeting they landed after dozens of calls to lawmakers. “Beforehand we were a case number in their big pile of paper,” says Jennie Landsman, the boys’ mother. “They had very technical objections. In the meeting we held up Benny and Josh, and we said ‘We hope this issue that is so technical isn’t going to stop the treatment.’”
The Dayton trial won a greenlight in December and began barely in time for Benny, who will hit the age cutoff of five years in June. “Benny is the pilot. Benny is the ‘God, we hope this works’ kid,” says Rockwell, who doesn’t yet have a date for her daughter’s procedure.
What’s the chance the therapy works? Gene-replacement techniques have been having notable successes, curing kids who don’t have immune systems, and preventing brain diseases. Since 2017, a small number of gene therapies have also been approved for sale in the US, at prices as high as $2.1 million per child.
Record prices have stoked interest among specialist biotech companies, which now see a business even in super-rare diseases. One, called Aspa Therapeutics, says it has plans to initiate a different Canavan gene-therapy trial. Its CEO, Eric David, estimates there are 1,000 children alive with the disease in the US and Europe. “That, for us, is enough,” he says.
There’s no certainty gene therapy will succeed in Canavan. Even if the corrected gene stops the disease from progressing, the kids’ brains may have already been irreversibly damaged.
Anti-vaxxers are weaponizing Yelp to punish bars that require vaccine proof
Smith’s Yelp reviews were shut down after the sudden flurry of activity on its page, which the company labels “unusual activity alerts,” a stopgap measure for both the business and Yelp to filter through a flood of reviews and pick out which are spam and which aren’t. Noorie Malik, Yelp’s vice president of user operations, said Yelp has a “team of moderators” that investigate pages that get an unusual amount of traffic. “After we’ve seen activity dramatically decrease or stop, we will then clean up the page so that only firsthand consumer experiences are reflected,” she said in a statement.
It’s a practice that Yelp has had to deploy more often over the course of the pandemic: According to Yelp’s 2020 Trust & Safety Report, the company saw a 206% increase over 2019 levels in unusual activity alerts. “Since January 2021, we’ve placed more than 15 unusual activity alerts on business pages related to a business’s stance on covid-19 vaccinations,” said Malik.
The majority of those cases have been since May, like the gay bar C.C. Attles in Seattle, which got an alert from Yelp after it made patrons show proof of vaccination at the door. Earlier this month, Moe’s Cantina in Chicago’s River North neighborhood got spammed after it attempted to isolate vaccinated customers from unvaccinated ones.
Spamming a business with one-star reviews is not a new tactic. In fact, perhaps the best-known case is Colorado’s Masterpiece bakery, which won a 2018 Supreme Court battle for refusing to make a wedding cake for a same-sex couple, after which it got pummeled by one-star reviews. “People are still writing fake reviews. People will always write fake reviews,” Liu says.
But he adds that today’s online audience know that platforms use algorithms to detect and flag problematic words, so bad actors can mask their grievances by blaming poor restaurant service like a more typical negative review to ensure the rating stays up — and counts.
That seems to have been the case with Knapp’s bar. His Yelp review included comments like “There was hair in my food” or alleged cockroach sightings. “Really ridiculous, fantastic shit,” Knapp says. “If you looked at previous reviews, you would understand immediately that this doesn’t make sense.”
Liu also says there is a limit to how much Yelp can improve their spam detection, since natural language — or the way we speak, read, and write — “is very tough for computer systems to detect.”
But Liu doesn’t think putting a human being in charge of figuring out which reviews are spam or not will solve the problem. “Human beings can’t do it,” he says. “Some people might get it right, some people might get it wrong. I have fake reviews on my webpage and even I can’t tell which are real or not.”
You might notice that I’ve only mentioned Yelp reviews thus far, despite the fact that Google reviews — which appear in the business description box on the right side of the Google search results page under “reviews” — is arguably more influential. That’s because Google’s review operations are, frankly, even more mysterious.
While businesses I spoke to said Yelp worked with them on identifying spam reviews, none of them had any luck with contacting Google’s team. “You would think Google would say, ‘Something is fucked up here,’” Knapp says. “These are IP addresses from overseas. It really undermines the review platform when things like this are allowed to happen.”
These creepy fake humans herald a new age in AI
Once viewed as less desirable than real data, synthetic data is now seen by some as a panacea. Real data is messy and riddled with bias. New data privacy regulations make it hard to collect. By contrast, synthetic data is pristine and can be used to build more diverse data sets. You can produce perfectly labeled faces, say, of different ages, shapes, and ethnicities to build a face-detection system that works across populations.
But synthetic data has its limitations. If it fails to reflect reality, it could end up producing even worse AI than messy, biased real-world data—or it could simply inherit the same problems. “What I don’t want to do is give the thumbs up to this paradigm and say, ‘Oh, this will solve so many problems,’” says Cathy O’Neil, a data scientist and founder of the algorithmic auditing firm ORCAA. “Because it will also ignore a lot of things.”
Realistic, not real
Deep learning has always been about data. But in the last few years, the AI community has learned that good data is more important than big data. Even small amounts of the right, cleanly labeled data can do more to improve an AI system’s performance than 10 times the amount of uncurated data, or even a more advanced algorithm.
That changes the way companies should approach developing their AI models, says Datagen’s CEO and cofounder, Ofir Chakon. Today, they start by acquiring as much data as possible and then tweak and tune their algorithms for better performance. Instead, they should be doing the opposite: use the same algorithm while improving on the composition of their data.
But collecting real-world data to perform this kind of iterative experimentation is too costly and time intensive. This is where Datagen comes in. With a synthetic data generator, teams can create and test dozens of new data sets a day to identify which one maximizes a model’s performance.
To ensure the realism of its data, Datagen gives its vendors detailed instructions on how many individuals to scan in each age bracket, BMI range, and ethnicity, as well as a set list of actions for them to perform, like walking around a room or drinking a soda. The vendors send back both high-fidelity static images and motion-capture data of those actions. Datagen’s algorithms then expand this data into hundreds of thousands of combinations. The synthesized data is sometimes then checked again. Fake faces are plotted against real faces, for example, to see if they seem realistic.
Datagen is now generating facial expressions to monitor driver alertness in smart cars, body motions to track customers in cashier-free stores, and irises and hand motions to improve the eye- and hand-tracking capabilities of VR headsets. The company says its data has already been used to develop computer-vision systems serving tens of millions of users.
It’s not just synthetic humans that are being mass-manufactured. Click-Ins is a startup that uses synthetic AI to perform automated vehicle inspections. Using design software, it re-creates all car makes and models that its AI needs to recognize and then renders them with different colors, damages, and deformations under different lighting conditions, against different backgrounds. This lets the company update its AI when automakers put out new models, and helps it avoid data privacy violations in countries where license plates are considered private information and thus cannot be present in photos used to train AI.
Mostly.ai works with financial, telecommunications, and insurance companies to provide spreadsheets of fake client data that let companies share their customer database with outside vendors in a legally compliant way. Anonymization can reduce a data set’s richness yet still fail to adequately protect people’s privacy. But synthetic data can be used to generate detailed fake data sets that share the same statistical properties as a company’s real data. It can also be used to simulate data that the company doesn’t yet have, including a more diverse client population or scenarios like fraudulent activity.
Proponents of synthetic data say that it can help evaluate AI as well. In a recent paper published at an AI conference, Suchi Saria, an associate professor of machine learning and health care at Johns Hopkins University, and her coauthors demonstrated how data-generation techniques could be used to extrapolate different patient populations from a single set of data. This could be useful if, for example, a company only had data from New York City’s more youthful population but wanted to understand how its AI performs on an aging population with higher prevalence of diabetes. She’s now starting her own company, Bayesian Health, which will use this technique to help test medical AI systems.
The limits of faking it
But is synthetic data overhyped?
When it comes to privacy, “just because the data is ‘synthetic’ and does not directly correspond to real user data does not mean that it does not encode sensitive information about real people,” says Aaron Roth, a professor of computer and information science at the University of Pennsylvania. Some data generation techniques have been shown to closely reproduce images or text found in the training data, for example, while others are vulnerable to attacks that make them fully regurgitate that data.
This might be fine for a firm like Datagen, whose synthetic data isn’t meant to conceal the identity of the individuals who consented to be scanned. But it would be bad news for companies that offer their solution as a way to protect sensitive financial or patient information.
Clinical trials are better, faster, cheaper with big data
“One of the most difficult parts of my job is enrolling patients into studies,” says Nicholas Borys, chief medical officer for Lawrenceville, N.J., biotechnology company Celsion, which develops next-generation chemotherapy and immunotherapy agents for liver and ovarian cancers and certain types of brain tumors. Borys estimates that fewer than 10% of cancer patients are enrolled in clinical trials. “If we could get that up to 20% or 30%, we probably could have had several cancers conquered by now.”
Clinical trials test new drugs, devices, and procedures to determine whether they’re safe and effective before they’re approved for general use. But the path from study design to approval is long, winding, and expensive. Today,researchers are using artificial intelligence and advanced data analytics to speed up the process, reduce costs, and get effective treatments more swiftly to those who need them. And they’re tapping into an underused but rapidly growing resource: data on patients from past trials
Building external controls
Clinical trials usually involve at least two groups, or “arms”: a test or experimental arm that receives the treatment under investigation, and a control arm that doesn’t. A control arm may receive no treatment at all, a placebo or the current standard of care for the disease being treated, depending on what type of treatment is being studied and what it’s being compared with under the study protocol. It’s easy to see the recruitment problem for investigators studying therapies for cancer and other deadly diseases: patients with a life-threatening condition need help now. While they might be willing to take a risk on a new treatment, “the last thing they want is to be randomized to a control arm,” Borys says. Combine that reluctance with the need to recruit patients who have relatively rare diseases—for example, a form of breast cancer characterized by a specific genetic marker—and the time to recruit enough people can stretch out for months, or even years. Nine out of 10 clinical trials worldwide—not just for cancer but for all types of conditions—can’t recruit enough people within their target timeframes. Some trials fail altogether for lack of enough participants.
What if researchers didn’t need to recruit a control group at all and could offer the experimental treatment to everyone who agreed to be in the study? Celsion is exploring such an approach with New York-headquartered Medidata, which provides management software and electronic data capture for more than half of the world’s clinical trials, serving most major pharmaceutical and medical device companies, as well as academic medical centers. Acquired by French software company Dassault Systèmes in 2019, Medidata has compiled an enormous “big data” resource: detailed information from more than 23,000 trials and nearly 7 million patients going back about 10 years.
The idea is to reuse data from patients in past trials to create “external control arms.” These groups serve the same function as traditional control arms, but they can be used in settings where a control group is difficult to recruit: for extremely rare diseases, for example, or conditions such as cancer, which are imminently life-threatening. They can also be used effectively for “single-arm” trials, which make a control group impractical: for example, to measure the effectiveness of an implanted device or a surgical procedure. Perhaps their most valuable immediate use is for doing rapid preliminary trials, to evaluate whether a treatment is worth pursuing to the point of a full clinical trial.
Medidata uses artificial intelligence to plumb its database and find patients who served as controls in past trials of treatments for a certain condition to create its proprietary version of external control arms. “We can carefully select these historical patients and match the current-day experimental arm with the historical trial data,” says Arnaub Chatterjee, senior vice president for products, Acorn AI at Medidata. (Acorn AI is Medidata’s data and analytics division.) The trials and the patients are matched for the objectives of the study—the so-called endpoints, such as reduced mortality or how long patients remain cancer-free—and for other aspects of the study designs, such as the type of data collected at the beginning of the study and along the way.
When creating an external control arm, “We do everything we can to mimic an ideal randomized controlled trial,” says Ruthie Davi, vice president of data science, Acorn AI at Medidata. The first step is to search the database for possible control arm candidates using the key eligibility criteria from the investigational trial: for example, the type of cancer, the key features of the disease and how advanced it is, and whether it’s the patient’s first time being treated. It’s essentially the same process used to select control patients in a standard clinical trial—except data recorded at the beginning of the past trial, rather than the current one, is used to determine eligibility, Davi says. “We are finding historical patients who would qualify for the trial if they existed today.”
Download the full report.
This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.