Optimizing protein stability using random mutations#

Stability optimization is a registered problem#

poli has a get_problems() method which let’s you know the problems you could create.

from poli import get_problems

As you can see, foldx_stability is already available in the repository.

Let’s stick with it as a problem name:

problem_name = "foldx_stability"

Optimizing mRouge#

In this example, we will focus on optimizing mRouge, also known as 3NED, one of the red fluorescent proteins explored in LaMBO [Stanton et al., 2022]. Before optimization, we need to download the file and “repair” it (see single mutations using foldx).

We assume that the repaired file is already here.

3ned_Repair.pdb                    optimizing_protein_stability.ipynb
from pathlib import Path

wildtype_pdb_path = Path("./3ned_Repair.pdb").resolve()
wildtype_pdb_path.exists()  # Should say True

Defining the objective function#

In this tutorial, we optimize the stability of mRogue using the foldx_stability black box. The first step is creating it:


In the particular case of foldx-related black boxes, you will need to have it properly installed. Check our documentation on installing foldx.

from poli.objective_repository import FoldXStabilityProblemFactory

problem_factory = FoldXStabilityProblemFactory()

problem = problem_factory.create(
f, x0 = problem.black_box, problem.x0
poli 🧪: Starting the function foldx_stability as an isolated process.

problem_factory.create returns a Problem instance. Problems have the following useful attributes:

  1. a black-box function in problem.black_box. In this case, it is a FoldXStabilityBlackBox.

  2. an initial design in problem.x0: np.ndarray, and

  3. All the relevant information about the black box (e.g. whether it’s deterministic, what the alphabet is…) in problem.info: BlackBoxInformation.

These are all the ingredients required for an abstract solver to work. The next section shows how to use a baseline solver, which can be easily replaced by any other solver you implement (as long as it inherits from the AbstractSolver in poli_baselines.core.abstract_solver).

Optimizing using a RandomMutation solver#

In this tutorial we use the simplest baseline for discrete sequence optimization: a RandomMutation which takes the best performing sequence and randomly mutates it by selecting a position at random, and altering for another element of the alphabet.


There’s nothing special about RandomMutation here. You could drop-in any solver you implement as long as it

  1. Inherits from AbstractSolver in poli_baselines.core.abstract_solver, and it

  2. implements the abstract method next_candidate() -> np.ndarray.

Check this tutorial on creating solvers for more details.

from poli_baselines.solvers.simple.random_mutation import RandomMutation

y0 = f(x0)

solver = RandomMutation(
And that’s it! You can optimize the objective function passed as black_box by just calling the .solve(n_iters) method: (be careful, this might take a while)


Checking the results#

After optimization, the results are stored inside solver.history, which is a dictionary with "x" and "y" keys. Let’s check what the best optimization result was:

print(f"All y values: {solver.history['y']}")
print(f"best stability: {solver.get_best_performance()}")
print(f"Associated sequence: {''.join(solver.get_best_solution().flatten())}")
All y values: [array([[9.46959]]), array([[10.4687]]), array([[9.14886]]), array([[6.56841]])]
best stability: [10.4687]