Optimizing protein stability using random mutations#

Stability optimization is a registered problem#

poli has a get_problems() method which let’s you know the problems you could create.

from poli import get_problems
get_problems()
['aloha',
 'dockstring',
 'drd3_docking',
 'foldx_rfp_lambo',
 'foldx_sasa',
 'foldx_stability',
 'foldx_stability_and_sasa',
 'gfp_cbas',
 'gfp_select',
 'penalized_logp_lambo',
 'rasp',
 'rdkit_logp',
 'rdkit_qed',
 'rfp_foldx_stability_and_sasa',
 'sa_tdc',
 'super_mario_bros',
 'white_noise',
 'toy_continuous_problem']

As you can see, foldx_stability is already available in the repository.

Let’s stick with it as a problem name:

problem_name = "foldx_stability"

Optimizing mRouge#

In this example, we will focus on optimizing mRouge, also known as 3NED, one of the red fluorescent proteins explored in LaMBO [Stanton et al., 2022]. Before optimization, we need to download the file and “repair” it (see single mutations using foldx).

We assume that the repaired file is already here.

!ls
3ned_Repair.pdb                    optimizing_protein_stability.ipynb
from pathlib import Path

wildtype_pdb_path = Path("./3ned_Repair.pdb").resolve()
wildtype_pdb_path.exists()  # Should say True
True

Defining the objective function#

In this tutorial, we optimize the stability of mRogue using the foldx_stability black box. The first step is creating it:

Warning

In the particular case of foldx-related black boxes, you will need to have it properly installed. Check our documentation on installing foldx.

from poli.objective_repository import FoldXStabilityProblemFactory

problem_factory = FoldXStabilityProblemFactory()

problem = problem_factory.create(
    wildtype_pdb_path=wildtype_pdb_path
)
f, x0 = problem.black_box, problem.x0
Hide code cell output
poli 🧪: Starting the function foldx_stability as an isolated process.

problem_factory.create returns a Problem instance. Problems have the following useful attributes:

  1. a black-box function in problem.black_box. In this case, it is a FoldXStabilityBlackBox.

  2. an initial design in problem.x0: np.ndarray, and

  3. All the relevant information about the black box (e.g. whether it’s deterministic, what the alphabet is…) in problem.info: BlackBoxInformation.

These are all the ingredients required for an abstract solver to work. The next section shows how to use a baseline solver, which can be easily replaced by any other solver you implement (as long as it inherits from the AbstractSolver in poli_baselines.core.abstract_solver).

Optimizing using a RandomMutation solver#

In this tutorial we use the simplest baseline for discrete sequence optimization: a RandomMutation which takes the best performing sequence and randomly mutates it by selecting a position at random, and altering for another element of the alphabet.

Note

There’s nothing special about RandomMutation here. You could drop-in any solver you implement as long as it

  1. Inherits from AbstractSolver in poli_baselines.core.abstract_solver, and it

  2. implements the abstract method next_candidate() -> np.ndarray.

Check this tutorial on creating solvers for more details.

from poli_baselines.solvers.simple.random_mutation import RandomMutation

y0 = f(x0)

solver = RandomMutation(
    black_box=f,
    x0=x0,
    y0=y0,
)
/Users/sjt972/anaconda3/envs/poli-docs2/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

And that’s it! You can optimize the objective function passed as black_box by just calling the .solve(n_iters) method: (be careful, this might take a while)

solver.solve(max_iter=3)

Checking the results#

After optimization, the results are stored inside solver.history, which is a dictionary with "x" and "y" keys. Let’s check what the best optimization result was:

print(f"All y values: {solver.history['y']}")
print(f"best stability: {solver.get_best_performance()}")
print(f"Associated sequence: {''.join(solver.get_best_solution().flatten())}")
All y values: [array([[9.46959]]), array([[10.4687]]), array([[9.14886]]), array([[6.56841]])]
best stability: [10.4687]
Associated sequence: EEDNMAIIKEFMRFKTHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLRGTNFPSDGPVMQKKTMGWEACSERMYPEDGALKGIMKMRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNTNTKLDITSHNEDYTIVEQYERNEGRHSTGGMDELYK