Optimizing protein stability using random mutations#
In this example, we optimize the thermal stability of mutations from a wildtype protein. To do so, we use the foldx_stability
problem.
Warning
In the particular case of foldx
-related black boxes, you will need to have it properly installed. Check our documentation on installing foldx.
You can also install all of the dependencies to run it using
pip install poli-core[foldx]
If you have done everything correclty, you should be able to run
~/foldx/foldx --version
Optimizing mRouge
#
In this example, we will focus on optimizing mRouge
, also known as 3NED
, one of the red fluorescent proteins explored in LaMBO
[Stanton et al., 2022]. Before optimization, we need to download the file and “repair” it (see single mutations using foldx).
We assume that the repaired file is already here.
!ls
3ned_Repair.pdb optimizing_protein_stability.ipynb
from pathlib import Path
wildtype_pdb_path = Path("./3ned_Repair.pdb").resolve()
wildtype_pdb_path.exists() # Should say True
True
Defining the objective function#
In this tutorial, we optimize the stability of mRogue
using the foldx_stability
black box. The first step is creating it:
from poli.objective_repository import FoldXStabilityProblemFactory
problem_factory = FoldXStabilityProblemFactory()
problem = problem_factory.create(
wildtype_pdb_path=wildtype_pdb_path
)
f, x0 = problem.black_box, problem.x0
Show code cell output
poli 🧪: Starting the function foldx_stability as an isolated process.
problem_factory.create
returns a Problem
instance. Problems have the following useful attributes:
a black-box function in
problem.black_box
. In this case, it is aFoldXStabilityBlackBox
.an initial design in
problem.x0: np.ndarray
, andAll the relevant information about the black box (e.g. whether it’s deterministic, what the alphabet is…) in
problem.info: BlackBoxInformation
.
These are all the ingredients required for an abstract solver to work. The next section shows how to use a baseline solver, which can be easily replaced by any other solver you implement (as long as it inherits from the AbstractSolver
in poli_baselines.core.abstract_solver
).
Optimizing using a RandomMutation
solver#
In this tutorial we use the simplest baseline for discrete sequence optimization: a RandomMutation
which takes the best performing sequence and randomly mutates it by selecting a position at random, and altering for another element of the alphabet.
Note
There’s nothing special about RandomMutation
here. You could drop-in any solver you implement as long as it
Inherits from
AbstractSolver
inpoli_baselines.core.abstract_solver
, and itimplements the abstract method
next_candidate() -> np.ndarray
.
from poli_baselines.solvers.simple.random_mutation import RandomMutation
y0 = f(x0)
solver = RandomMutation(
black_box=f,
x0=x0,
y0=y0,
)
And that’s it! You can optimize the objective function passed as black_box
by just calling the .solve(n_iters)
method: (be careful, this might take a while)
solver.solve(max_iter=3)
Show code cell output
Traceback (most recent call last):
File "/Users/sjt972/Projects/poli/src/poli/external_isolated_function_script.py", line 146, in <module>
run(args.objective_name, args.port, args.password)
File "/Users/sjt972/Projects/poli/src/poli/external_isolated_function_script.py", line 114, in run
y = f(x, context=context)
File "/Users/sjt972/anaconda3/envs/poli-docs2/lib/python3.9/site-packages/poli/objective_repository/foldx_stability/isolated_function.py", line 100, in __call__
stability = foldx_interface.compute_stability(
File "/Users/sjt972/Projects/poli/src/poli/core/util/proteins/foldx.py", line 460, in compute_stability
self._simulate_mutations(pdb_file, mutations)
File "/Users/sjt972/Projects/poli/src/poli/core/util/proteins/foldx.py", line 349, in _simulate_mutations
subprocess.run(
File "/Users/sjt972/anaconda3/envs/poli__protein/lib/python3.9/subprocess.py", line 507, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/Users/sjt972/anaconda3/envs/poli__protein/lib/python3.9/subprocess.py", line 1126, in communicate
self.wait()
File "/Users/sjt972/anaconda3/envs/poli__protein/lib/python3.9/subprocess.py", line 1189, in wait
return self._wait(timeout=timeout)
File "/Users/sjt972/anaconda3/envs/poli__protein/lib/python3.9/subprocess.py", line 1933, in _wait
(pid, sts) = self._try_wait(0)
File "/Users/sjt972/anaconda3/envs/poli__protein/lib/python3.9/subprocess.py", line 1891, in _try_wait
(pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[5], line 1
----> 1 solver.solve(max_iter=3)
File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli_baselines/core/step_by_step_solver.py:133, in StepByStepSolver.solve(self, max_iter, n_initial_points, seed, break_at_performance, verbose, pre_step_callbacks, post_step_callbacks)
130 callback(self)
132 # Take a step, which in turn updates the local history.
--> 133 _, y = self.step()
135 # Call the post-step callbacks
136 if post_step_callbacks is not None:
File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli_baselines/core/step_by_step_solver.py:64, in StepByStepSolver.step(self)
60 """
61 Runs the solver for one iteration.
62 """
63 x = self.next_candidate()
---> 64 y = self.black_box(x)
66 self.update(x, y)
67 self.post_update(x, y)
File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli/core/abstract_black_box.py:277, in AbstractBlackBox.__call__(self, x, context)
275 f_batch = np.array(f_batch_).reshape(len(x_batch_), -1)
276 else: # iterative treatment
--> 277 f_batch = self._black_box(x_batch, context)
279 assert (
280 len(f_batch.shape) == 2
281 ), f"len(f(x).shape)={len(f_batch.shape)}, expected 2"
283 assert isinstance(
284 f_batch, np.ndarray
285 ), f"type(f)={type(f_batch)}, not np.ndarray"
File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli/objective_repository/foldx_stability/register.py:151, in FoldXStabilityBlackBox._black_box(self, x, context)
117 """
118 Runs the given input x and pdb files provided
119 in the context through FoldX and returns the
(...)
137
138 """
139 inner_function = get_inner_function(
140 isolated_function_name="foldx_stability__isolated",
141 class_name="FoldXStabilityIsolatedLogic",
(...)
149 verbose=self.verbose,
150 )
--> 151 return inner_function(x, context)
File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli/core/util/isolation/external_function.py:49, in ExternalFunction.__call__(self, x, context)
26 """
27 Evaluates the black-box function.
28
(...)
46 The output data points.
47 """
48 self.process_wrapper.send(["QUERY", x, context])
---> 49 msg_type, *val = self.process_wrapper.recv()
50 if msg_type == "EXCEPTION":
51 e, traceback_ = val
File ~/anaconda3/envs/poli-docs3/lib/python3.10/site-packages/poli/core/util/inter_process_communication/process_wrapper.py:152, in ProcessWrapper.recv(self)
143 def recv(self):
144 """
145 Receive data from the connection.
146
(...)
150 The received data.
151 """
--> 152 return self.conn.recv()
File ~/anaconda3/envs/poli-docs3/lib/python3.10/multiprocessing/connection.py:250, in _ConnectionBase.recv(self)
248 self._check_closed()
249 self._check_readable()
--> 250 buf = self._recv_bytes()
251 return _ForkingPickler.loads(buf.getbuffer())
File ~/anaconda3/envs/poli-docs3/lib/python3.10/multiprocessing/connection.py:414, in Connection._recv_bytes(self, maxsize)
413 def _recv_bytes(self, maxsize=None):
--> 414 buf = self._recv(4)
415 size, = struct.unpack("!i", buf.getvalue())
416 if size == -1:
File ~/anaconda3/envs/poli-docs3/lib/python3.10/multiprocessing/connection.py:379, in Connection._recv(self, size, read)
377 remaining = size
378 while remaining > 0:
--> 379 chunk = read(handle, remaining)
380 n = len(chunk)
381 if n == 0:
KeyboardInterrupt:
Checking the results#
After optimization, the results are stored inside solver.history
, which is a dictionary with "x"
and "y"
keys. Let’s check what the best optimization result was:
print(f"All y values: {solver.history['y']}")
print(f"best stability: {solver.get_best_performance()}")
print(f"Associated sequence: {''.join(solver.get_best_solution().flatten())}")
All y values: [array([[9.41639]]), array([[7.3365]]), array([[9.87284]]), array([[4.63237]])]
best stability: [9.87284]
Associated sequence: EEDNMAIIKEFMRFKTHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLIGTNFPSDGPVMQKKTMGWEACSERMYPEDGALKGEMKMRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNTNTKLDITSHNEDYTIVEQYERNEGRHSTGGMDELYK