Optimizing protein stability using random mutations#

In this example, we optimize the thermal stability of mutations from a wildtype protein. To do so, we use the foldx_stability problem.

Warning

In the particular case of foldx-related black boxes, you will need to have it properly installed. Check our documentation on installing foldx.

You can also install all of the dependencies to run it using

pip install poli-core[foldx]

If you have done everything correclty, you should be able to run

~/foldx/foldx --version

Optimizing mRouge#

In this example, we will focus on optimizing mRouge, also known as 3NED, one of the red fluorescent proteins explored in LaMBO [Stanton et al., 2022]. Before optimization, we need to download the file and “repair” it (see single mutations using foldx).

We assume that the repaired file is already here.

!ls
3ned_Repair.pdb                    optimizing_protein_stability.ipynb
from pathlib import Path

wildtype_pdb_path = Path("./3ned_Repair.pdb").resolve()
wildtype_pdb_path.exists()  # Should say True
True

Defining the objective function#

In this tutorial, we optimize the stability of mRogue using the foldx_stability black box. The first step is creating it:

from poli.objective_repository import FoldXStabilityProblemFactory

problem_factory = FoldXStabilityProblemFactory()

problem = problem_factory.create(
    wildtype_pdb_path=wildtype_pdb_path
)
f, x0 = problem.black_box, problem.x0
Hide code cell output
poli 🧪: Starting the function foldx_stability as an isolated process.

problem_factory.create returns a Problem instance. Problems have the following useful attributes:

  1. a black-box function in problem.black_box. In this case, it is a FoldXStabilityBlackBox.

  2. an initial design in problem.x0: np.ndarray, and

  3. All the relevant information about the black box (e.g. whether it’s deterministic, what the alphabet is…) in problem.info: BlackBoxInformation.

These are all the ingredients required for an abstract solver to work. The next section shows how to use a baseline solver, which can be easily replaced by any other solver you implement (as long as it inherits from the AbstractSolver in poli_baselines.core.abstract_solver).

Optimizing using a RandomMutation solver#

In this tutorial we use the simplest baseline for discrete sequence optimization: a RandomMutation which takes the best performing sequence and randomly mutates it by selecting a position at random, and altering for another element of the alphabet.

Note

There’s nothing special about RandomMutation here. You could drop-in any solver you implement as long as it

  1. Inherits from AbstractSolver in poli_baselines.core.abstract_solver, and it

  2. implements the abstract method next_candidate() -> np.ndarray.

Check this tutorial on creating solvers for more details.

from poli_baselines.solvers.simple.random_mutation import RandomMutation

y0 = f(x0)

solver = RandomMutation(
    black_box=f,
    x0=x0,
    y0=y0,
)

And that’s it! You can optimize the objective function passed as black_box by just calling the .solve(n_iters) method: (be careful, this might take a while)

solver.solve(max_iter=3)
Hide code cell output
Traceback (most recent call last):
  File "/Users/sjt972/Projects/poli/src/poli/external_isolated_function_script.py", line 146, in <module>
    run(args.objective_name, args.port, args.password)
  File "/Users/sjt972/Projects/poli/src/poli/external_isolated_function_script.py", line 114, in run
    y = f(x, context=context)
  File "/Users/sjt972/anaconda3/envs/poli-docs2/lib/python3.9/site-packages/poli/objective_repository/foldx_stability/isolated_function.py", line 100, in __call__
    stability = foldx_interface.compute_stability(
  File "/Users/sjt972/Projects/poli/src/poli/core/util/proteins/foldx.py", line 460, in compute_stability
    self._simulate_mutations(pdb_file, mutations)
  File "/Users/sjt972/Projects/poli/src/poli/core/util/proteins/foldx.py", line 349, in _simulate_mutations
    subprocess.run(
  File "/Users/sjt972/anaconda3/envs/poli__protein/lib/python3.9/subprocess.py", line 507, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/Users/sjt972/anaconda3/envs/poli__protein/lib/python3.9/subprocess.py", line 1126, in communicate
    self.wait()
  File "/Users/sjt972/anaconda3/envs/poli__protein/lib/python3.9/subprocess.py", line 1189, in wait
    return self._wait(timeout=timeout)
  File "/Users/sjt972/anaconda3/envs/poli__protein/lib/python3.9/subprocess.py", line 1933, in _wait
    (pid, sts) = self._try_wait(0)
  File "/Users/sjt972/anaconda3/envs/poli__protein/lib/python3.9/subprocess.py", line 1891, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[5], line 1
----> 1 solver.solve(max_iter=3)

File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli_baselines/core/step_by_step_solver.py:133, in StepByStepSolver.solve(self, max_iter, n_initial_points, seed, break_at_performance, verbose, pre_step_callbacks, post_step_callbacks)
    130         callback(self)
    132 # Take a step, which in turn updates the local history.
--> 133 _, y = self.step()
    135 # Call the post-step callbacks
    136 if post_step_callbacks is not None:

File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli_baselines/core/step_by_step_solver.py:64, in StepByStepSolver.step(self)
     60 """
     61 Runs the solver for one iteration.
     62 """
     63 x = self.next_candidate()
---> 64 y = self.black_box(x)
     66 self.update(x, y)
     67 self.post_update(x, y)

File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli/core/abstract_black_box.py:277, in AbstractBlackBox.__call__(self, x, context)
    275         f_batch = np.array(f_batch_).reshape(len(x_batch_), -1)
    276 else:  # iterative treatment
--> 277     f_batch = self._black_box(x_batch, context)
    279 assert (
    280     len(f_batch.shape) == 2
    281 ), f"len(f(x).shape)={len(f_batch.shape)}, expected 2"
    283 assert isinstance(
    284     f_batch, np.ndarray
    285 ), f"type(f)={type(f_batch)}, not np.ndarray"

File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli/objective_repository/foldx_stability/register.py:151, in FoldXStabilityBlackBox._black_box(self, x, context)
    117 """
    118 Runs the given input x and pdb files provided
    119 in the context through FoldX and returns the
   (...)
    137 
    138 """
    139 inner_function = get_inner_function(
    140     isolated_function_name="foldx_stability__isolated",
    141     class_name="FoldXStabilityIsolatedLogic",
   (...)
    149     verbose=self.verbose,
    150 )
--> 151 return inner_function(x, context)

File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli/core/util/isolation/external_function.py:49, in ExternalFunction.__call__(self, x, context)
     26 """
     27 Evaluates the black-box function.
     28 
   (...)
     46     The output data points.
     47 """
     48 self.process_wrapper.send(["QUERY", x, context])
---> 49 msg_type, *val = self.process_wrapper.recv()
     50 if msg_type == "EXCEPTION":
     51     e, traceback_ = val

File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli/core/util/inter_process_communication/process_wrapper.py:152, in ProcessWrapper.recv(self)
    143 def recv(self):
    144     """
    145     Receive data from the connection.
    146 
   (...)
    150         The received data.
    151     """
--> 152     return self.conn.recv()

File ~/anaconda3/envs/poli-docs3/lib/python3.10/multiprocessing/connection.py:250, in _ConnectionBase.recv(self)
    248 self._check_closed()
    249 self._check_readable()
--> 250 buf = self._recv_bytes()
    251 return _ForkingPickler.loads(buf.getbuffer())

File ~/anaconda3/envs/poli-docs3/lib/python3.10/multiprocessing/connection.py:414, in Connection._recv_bytes(self, maxsize)
    413 def _recv_bytes(self, maxsize=None):
--> 414     buf = self._recv(4)
    415     size, = struct.unpack("!i", buf.getvalue())
    416     if size == -1:

File ~/anaconda3/envs/poli-docs3/lib/python3.10/multiprocessing/connection.py:379, in Connection._recv(self, size, read)
    377 remaining = size
    378 while remaining > 0:
--> 379     chunk = read(handle, remaining)
    380     n = len(chunk)
    381     if n == 0:

KeyboardInterrupt: 

Checking the results#

After optimization, the results are stored inside solver.history, which is a dictionary with "x" and "y" keys. Let’s check what the best optimization result was:

print(f"All y values: {solver.history['y']}")
print(f"best stability: {solver.get_best_performance()}")
print(f"Associated sequence: {''.join(solver.get_best_solution().flatten())}")
All y values: [array([[9.41639]]), array([[7.3365]]), array([[9.87284]]), array([[4.63237]])]
best stability: [9.87284]
Associated sequence: EEDNMAIIKEFMRFKTHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLIGTNFPSDGPVMQKKTMGWEACSERMYPEDGALKGEMKMRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNTNTKLDITSHNEDYTIVEQYERNEGRHSTGGMDELYK