Optimizing protein stability using random mutations#
Stability optimization is a registered problem#
poli
has a get_problems()
method which let’s you know the problems you could create.
from poli import get_problems
get_problems()
['aloha',
'dockstring',
'drd3_docking',
'foldx_rfp_lambo',
'foldx_sasa',
'foldx_stability',
'foldx_stability_and_sasa',
'gfp_cbas',
'gfp_select',
'penalized_logp_lambo',
'rasp',
'rdkit_logp',
'rdkit_qed',
'rfp_foldx_stability_and_sasa',
'sa_tdc',
'super_mario_bros',
'white_noise',
'toy_continuous_problem']
As you can see, foldx_stability
is already available in the repository.
Let’s stick with it as a problem name:
problem_name = "foldx_stability"
Optimizing mRouge
#
In this example, we will focus on optimizing mRouge
, also known as 3NED
, one of the red fluorescent proteins explored in LaMBO
[Stanton et al., 2022]. Before optimization, we need to download the file and “repair” it (see single mutations using foldx).
We assume that the repaired file is already here.
!ls
3ned_Repair.pdb optimizing_protein_stability.ipynb
from pathlib import Path
wildtype_pdb_path = Path("./3ned_Repair.pdb").resolve()
wildtype_pdb_path.exists() # Should say True
True
Defining the objective function#
In this tutorial, we optimize the stability of mRogue
using the foldx_stability
black box. The first step is creating it:
Warning
In the particular case of foldx
-related black boxes, you will need to have it properly installed. Check our documentation on installing foldx.
from poli.objective_repository import FoldXStabilityProblemFactory
problem_factory = FoldXStabilityProblemFactory()
problem = problem_factory.create(
wildtype_pdb_path=wildtype_pdb_path
)
f, x0 = problem.black_box, problem.x0
Show code cell output
poli 🧪: Starting the function foldx_stability as an isolated process.
problem_factory.create
returns a Problem
instance. Problems have the following useful attributes:
a black-box function in
problem.black_box
. In this case, it is aFoldXStabilityBlackBox
.an initial design in
problem.x0: np.ndarray
, andAll the relevant information about the black box (e.g. whether it’s deterministic, what the alphabet is…) in
problem.info: BlackBoxInformation
.
These are all the ingredients required for an abstract solver to work. The next section shows how to use a baseline solver, which can be easily replaced by any other solver you implement (as long as it inherits from the AbstractSolver
in poli_baselines.core.abstract_solver
).
Optimizing using a RandomMutation
solver#
In this tutorial we use the simplest baseline for discrete sequence optimization: a RandomMutation
which takes the best performing sequence and randomly mutates it by selecting a position at random, and altering for another element of the alphabet.
Note
There’s nothing special about RandomMutation
here. You could drop-in any solver you implement as long as it
Inherits from
AbstractSolver
inpoli_baselines.core.abstract_solver
, and itimplements the abstract method
next_candidate() -> np.ndarray
.
from poli_baselines.solvers.simple.random_mutation import RandomMutation
y0 = f(x0)
solver = RandomMutation(
black_box=f,
x0=x0,
y0=y0,
)
/Users/sjt972/anaconda3/envs/poli-docs2/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
And that’s it! You can optimize the objective function passed as black_box
by just calling the .solve(n_iters)
method: (be careful, this might take a while)
solver.solve(max_iter=3)
Checking the results#
After optimization, the results are stored inside solver.history
, which is a dictionary with "x"
and "y"
keys. Let’s check what the best optimization result was:
print(f"All y values: {solver.history['y']}")
print(f"best stability: {solver.get_best_performance()}")
print(f"Associated sequence: {''.join(solver.get_best_solution().flatten())}")
All y values: [array([[9.46959]]), array([[10.4687]]), array([[9.14886]]), array([[6.56841]])]
best stability: [10.4687]
Associated sequence: EEDNMAIIKEFMRFKTHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLRGTNFPSDGPVMQKKTMGWEACSERMYPEDGALKGIMKMRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNTNTKLDITSHNEDYTIVEQYERNEGRHSTGGMDELYK