Optimizing protein stability using random mutations#

In this example, we optimize the thermal stability of mutations from a wildtype protein. To do so, we use the foldx_stability problem.

Warning

In the particular case of foldx-related black boxes, you will need to have it properly installed. Check our documentation on installing foldx.

You can also install all of the dependencies to run it using

pip install poli-core[foldx]

If you have done everything correclty, you should be able to run

~/foldx/foldx --version

Optimizing mRouge#

In this example, we will focus on optimizing mRouge, also known as 3NED, one of the red fluorescent proteins explored in LaMBO [Stanton et al., 2022]. Before optimization, we need to download the file and “repair” it (see single mutations using foldx).

We assume that the repaired file is already here.

!ls
3ned_Repair.pdb                    optimizing_protein_stability.ipynb
from pathlib import Path

wildtype_pdb_path = Path("./3ned_Repair.pdb").resolve()
wildtype_pdb_path.exists()  # Should say True
True

Defining the objective function#

In this tutorial, we optimize the stability of mRogue using the foldx_stability black box. The first step is creating it:

from poli.objective_repository import FoldXStabilityProblemFactory

problem_factory = FoldXStabilityProblemFactory()

problem = problem_factory.create(
    wildtype_pdb_path=wildtype_pdb_path
)
f, x0 = problem.black_box, problem.x0
Hide code cell output
poli 🧪: Starting the function foldx_stability as an isolated process.

problem_factory.create returns a Problem instance. Problems have the following useful attributes:

  1. a black-box function in problem.black_box. In this case, it is a FoldXStabilityBlackBox.

  2. an initial design in problem.x0: np.ndarray, and

  3. All the relevant information about the black box (e.g. whether it’s deterministic, what the alphabet is…) in problem.info: BlackBoxInformation.

These are all the ingredients required for an abstract solver to work. The next section shows how to use a baseline solver, which can be easily replaced by any other solver you implement (as long as it inherits from the AbstractSolver in poli_baselines.core.abstract_solver).

Optimizing using a RandomMutation solver#

In this tutorial we use the simplest baseline for discrete sequence optimization: a RandomMutation which takes the best performing sequence and randomly mutates it by selecting a position at random, and altering for another element of the alphabet.

Note

There’s nothing special about RandomMutation here. You could drop-in any solver you implement as long as it

  1. Inherits from AbstractSolver in poli_baselines.core.abstract_solver, and it

  2. implements the abstract method next_candidate() -> np.ndarray.

Check this tutorial on creating solvers for more details.

from poli_baselines.solvers.simple.random_mutation import RandomMutation

y0 = f(x0)

solver = RandomMutation(
    black_box=f,
    x0=x0,
    y0=y0,
)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 from poli_baselines.solvers.simple.random_mutation import RandomMutation
      3 y0 = f(x0)
      5 solver = RandomMutation(
      6     black_box=f,
      7     x0=x0,
      8     y0=y0,
      9 )

File ~/Projects/poli-baselines/src/poli_baselines/solvers/simple/random_mutation.py:22
     17 from poli.core.abstract_black_box import AbstractBlackBox
     19 from poli_baselines.core.step_by_step_solver import StepByStepSolver
---> 22 class RandomMutation(StepByStepSolver):
     23     def __init__(
     24         self,
     25         black_box: AbstractBlackBox,
   (...)
     33         tokenizer: Callable[[str], list[str]] | None = None,
     34     ):
     35         if x0.ndim == 1:

File ~/Projects/poli-baselines/src/poli_baselines/solvers/simple/random_mutation.py:32, in RandomMutation()
     22 class RandomMutation(StepByStepSolver):
     23     def __init__(
     24         self,
     25         black_box: AbstractBlackBox,
     26         x0: np.ndarray,
     27         y0: np.ndarray,
     28         n_mutations: int = 1,
     29         top_k: int = 1,
     30         batch_size: int = 1,
     31         greedy: bool = True,
---> 32         alphabet: list[str] | None = None,
     33         tokenizer: Callable[[str], list[str]] | None = None,
     34     ):
     35         if x0.ndim == 1:
     36             if tokenizer is None:

TypeError: unsupported operand type(s) for |: 'types.GenericAlias' and 'NoneType'

And that’s it! You can optimize the objective function passed as black_box by just calling the .solve(n_iters) method: (be careful, this might take a while)

solver.solve(max_iter=3)

Checking the results#

After optimization, the results are stored inside solver.history, which is a dictionary with "x" and "y" keys. Let’s check what the best optimization result was:

print(f"All y values: {solver.history['y']}")
print(f"best stability: {solver.get_best_performance()}")
print(f"Associated sequence: {''.join(solver.get_best_solution().flatten())}")
All y values: [array([[9.41639]]), array([[7.3365]]), array([[9.87284]]), array([[4.63237]])]
best stability: [9.87284]
Associated sequence: EEDNMAIIKEFMRFKTHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLIGTNFPSDGPVMQKKTMGWEACSERMYPEDGALKGEMKMRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNTNTKLDITSHNEDYTIVEQYERNEGRHSTGGMDELYK