Exclusive: Berkeley team adopted AI for Environment Science
Nitrogen Oxides (NOx) are pollutant known for the infamous LA smog. But scientist has not been able to fully crack its underlying mechanism until ...
There is overwhelming studies that concluded that human activities such as the consumption of fossil fuels have created long lasting environmental challenges including air pollutions. According to the WHO, air pollution kills at least 7 million people a year, while a recent study estimated 8.7 million early deaths a year from coal, oil and gas burning – 20% of all deaths. Among the pollutants from fuels, Nitrogen Oxides (NOx) are particularly notorious for causing the infamous Los Angelas Smog in the 1950s~70s, which subsequently pushed Congress to establish an entirely new regulatory entity now know as the EPA.
Given the severty of the issue, scientists have sought out to tackle the challenge. However, fundemantal understanding of NOx's reactions is still lacking, preventing more systematic solutions from being developed. In particular, the mechanism of NOx's reactive uptake, which dictates it's concentration in the air, was still a mystery, until very recently. The Limmer Research Group at UC Berkeley revealed in their paper published on 《Science》, a method of using AI to unveal the reactive uptake mechanism of NOx.
We have had the previlege to sit down with Dr. Limmer to talk about his work. Here are some highlights.
Q: Why NOx reactive uptake is an important and challenging topic?
There is an outstanding challenge in atmospheric chemistry associated with where all the the NOx compounds go. Despite a ton known about how many molecules the N2O5 get irreversibly taken up by a droplet on a macro-level, the molecular details are still lacking.
This is a challenging system to study. Because the reaction occurs quickly. N2O5 is a gaseous molecule, it doesn't have much time to get into the bulk of aerosol, into the interior. Which means that likely, a lot of the reactivity happens at the interface.
This is what really makes it difficult. To begin with, reactive system requires quantum mechanics simulation, which is costly. On top of that, representing reaction at an interface, adds additional challenges. Because it requires a big enough to differentiate between an interfacial region and a bulk liquid behind it. Just needing a big enough system and needing a quantum mechanical description of that system really made it intractable. They just couldn't be done using traditional methods.
Q: How is the experience of using AI for Science?
There are two distinct problems within this challenge. One is the representation problem (aka. how do you describe the problem setting to a computer), the other is the rare event problem (aka. if the event you want to observe is rare, then it would take a computer a prohibitively long time to simulate).
If you want to study chemistry at an interface, you have to worry about representing a big enough system. I had known about Deep Potential Molecular Dynamics (DeePMD) since I had been at Princeton, right as it was starting to be kind of work done, by Weinan E and his group. This was the right tool, because if you can construct a neural network, you can really get to a very large system, very effectively.
At the same time, you also have to wait for the reactive event to occur. In most of chemistry, reactive event on atomic level happens extremely rare, relative to molecule’s own movement. A reaction might take nanoseconds, whereas a typical time scale for a molecule to diffuse its diameter is a picosecond, 1000x faster. That gets worse and worse the rarer of the event that you want to see.
In this study, it is observed that the wait time for reaction is long because the molecules move slowly and they have to get in just the right confirmation in order for that reaction to occur. But when they get in the right position, it occurs very quickly. We adopted a workaround by putting in a bias for molecules to go where they want to react, and then account for that bias in a statistical sense, in order to have something which is ultimately an unbiased system and of how long it would take for a reaction to occur.
The methodology yielded orders of magnitude improvement in efficiency. The savings for the calculations we did was something like, even including training and everything, like a factor of 1,000. In reality, my postdoc spent 2 years training these models and running in all the calculations. So the traditional method would have taken 1,000 years, right? Something of that order. I maybe if if I was really lucky and got access to the whole supercomputer, maybe you cut that down by a factor of ten, but it's still a hundred years.
Q: Intuitively how is using AI for science different from other popular AI applications (such as AlphaGo)?
So the the strategies one uses in ai for something like AlphaGo, there's many steps before you get an indication of success. In AlphaGo, you need to train a computer to develop some strategy. And then only after many moves, do you know if that strategy is gonna pay off. In some sense, science is a much simpler kind of problem for AI.
Loosely speaking, physics is acting like a guardrail for AI so it does not wander off. In this study, researchers were able to put in a lot of physical constraints to make AI’s training much easier. For example, the forces between atoms shouldn't matter on things like the absolute coordinate system, so one can have a transitional or rotational invariance built into the machine learning scheme. Another important contrain is “locality”. That particles only interact in a finite range means that AI don't have to learn a global surface, which drastically reduced the learning difficulty.
For most problems, the key might be finding the middle ground somewhere between using the flexibility afforded by a neural network and then bringing in physical intuition.
Q: What's next for AI in Environmental Science
There's an immediate next step. Our work indicate that longstanding perspective was wrong — that the majority reaction really does happen at the interface, not in the bulk.
While exciting, it comes with a big challenge. Previous works derived from the old perspective now need to be reestablished and reevaluated. The short-term goal is to figure out how to generalize this perspective to the full complexity of the atmosphere.
The longer-term goal, at least on the DeePMD side is to keep fitting reactive models because they really changed the game for what we can study.
From my perspective, I was mostly motivated by the basic physics and chemistry of how reactions could be different at interfaces. For example, cellular environments are almost all interfaces. There are proteins and membranes and all sorts of things that are catalyzing is in reactivity. We haven't had the tools to study any of those things. So part of my group is starting to deploy these same sorts of methodologies for biochemical reactions. Like, how does the cells start manufacturing proteins? That's something that would be very difficult to stimulate without neural network force fields.
--
Final thoughts
Beyond Dr. Limmer's work, the adoption of AI for Science is helping shed light on various previously prohibitive areas, such as:
§ Particle Physics, Earth Science (high temperature, pressure and other extreme conditions)
§ Protein folding, high entropy alloy (complexity due to scale)
§ Battery solid electrolyte interphase, catalysis (complexity due to interface)
These works might seem far away from our lives, but they are in fact critical for building a prosperous, sustainable green future by enabling more efficient energy generation & storage, faster drug discovery and more enduring and environmentally friendly materials & products. With AI for Science, we might finally get the promised flying cars, instead of just the 144 characters.