Optimizing protein stability using random mutations#
In this example, we optimize the thermal stability of mutations from a wildtype protein. To do so, we use the foldx_stability
problem.
Warning
In the particular case of foldx
-related black boxes, you will need to have it properly installed. Check our documentation on installing foldx.
You can also install all of the dependencies to run it using
pip install poli-core[foldx]
If you have done everything correclty, you should be able to run
~/foldx/foldx --version
Optimizing mRouge
#
In this example, we will focus on optimizing mRouge
, also known as 3NED
, one of the red fluorescent proteins explored in LaMBO
[Stanton et al., 2022]. Before optimization, we need to download the file and “repair” it (see single mutations using foldx).
We assume that the repaired file is already here.
!ls
3ned_Repair.pdb optimizing_protein_stability.ipynb
from pathlib import Path
wildtype_pdb_path = Path("./3ned_Repair.pdb").resolve()
wildtype_pdb_path.exists() # Should say True
True
Defining the objective function#
In this tutorial, we optimize the stability of mRogue
using the foldx_stability
black box. The first step is creating it:
from poli.objective_repository import FoldXStabilityProblemFactory
problem_factory = FoldXStabilityProblemFactory()
problem = problem_factory.create(
wildtype_pdb_path=wildtype_pdb_path
)
f, x0 = problem.black_box, problem.x0
Show code cell output
poli 🧪: Starting the function foldx_stability as an isolated process.
problem_factory.create
returns a Problem
instance. Problems have the following useful attributes:
a black-box function in
problem.black_box
. In this case, it is aFoldXStabilityBlackBox
.an initial design in
problem.x0: np.ndarray
, andAll the relevant information about the black box (e.g. whether it’s deterministic, what the alphabet is…) in
problem.info: BlackBoxInformation
.
These are all the ingredients required for an abstract solver to work. The next section shows how to use a baseline solver, which can be easily replaced by any other solver you implement (as long as it inherits from the AbstractSolver
in poli_baselines.core.abstract_solver
).
Optimizing using a RandomMutation
solver#
In this tutorial we use the simplest baseline for discrete sequence optimization: a RandomMutation
which takes the best performing sequence and randomly mutates it by selecting a position at random, and altering for another element of the alphabet.
Note
There’s nothing special about RandomMutation
here. You could drop-in any solver you implement as long as it
Inherits from
AbstractSolver
inpoli_baselines.core.abstract_solver
, and itimplements the abstract method
next_candidate() -> np.ndarray
.
from poli_baselines.solvers.simple.random_mutation import RandomMutation
y0 = f(x0)
solver = RandomMutation(
black_box=f,
x0=x0,
y0=y0,
)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 from poli_baselines.solvers.simple.random_mutation import RandomMutation
3 y0 = f(x0)
5 solver = RandomMutation(
6 black_box=f,
7 x0=x0,
8 y0=y0,
9 )
File ~/Projects/poli-baselines/src/poli_baselines/solvers/simple/random_mutation.py:22
17 from poli.core.abstract_black_box import AbstractBlackBox
19 from poli_baselines.core.step_by_step_solver import StepByStepSolver
---> 22 class RandomMutation(StepByStepSolver):
23 def __init__(
24 self,
25 black_box: AbstractBlackBox,
(...)
33 tokenizer: Callable[[str], list[str]] | None = None,
34 ):
35 if x0.ndim == 1:
File ~/Projects/poli-baselines/src/poli_baselines/solvers/simple/random_mutation.py:32, in RandomMutation()
22 class RandomMutation(StepByStepSolver):
23 def __init__(
24 self,
25 black_box: AbstractBlackBox,
26 x0: np.ndarray,
27 y0: np.ndarray,
28 n_mutations: int = 1,
29 top_k: int = 1,
30 batch_size: int = 1,
31 greedy: bool = True,
---> 32 alphabet: list[str] | None = None,
33 tokenizer: Callable[[str], list[str]] | None = None,
34 ):
35 if x0.ndim == 1:
36 if tokenizer is None:
TypeError: unsupported operand type(s) for |: 'types.GenericAlias' and 'NoneType'
And that’s it! You can optimize the objective function passed as black_box
by just calling the .solve(n_iters)
method: (be careful, this might take a while)
solver.solve(max_iter=3)
Checking the results#
After optimization, the results are stored inside solver.history
, which is a dictionary with "x"
and "y"
keys. Let’s check what the best optimization result was:
print(f"All y values: {solver.history['y']}")
print(f"best stability: {solver.get_best_performance()}")
print(f"Associated sequence: {''.join(solver.get_best_solution().flatten())}")
All y values: [array([[9.41639]]), array([[7.3365]]), array([[9.87284]]), array([[4.63237]])]
best stability: [9.87284]
Associated sequence: EEDNMAIIKEFMRFKTHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLIGTNFPSDGPVMQKKTMGWEACSERMYPEDGALKGEMKMRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNTNTKLDITSHNEDYTIVEQYERNEGRHSTGGMDELYK