Protein (RFP) stability and SASA (using foldx,lambo)#

Type of objective function: discrete Environment to run this objective function: poli protein

About#

This objective function returns stability using foldx and SASA, exactly as done in the lambo implementation.

Prerequisites#

foldx#

We need you to have foldx installed, and available in your home directory. We expect the following files to be there:

  • ~/foldx/foldx: the binary. You might need to rename it.

  • ~/foldx/rotabase.txt: a text file necessary for foldx to run.

Python environment#

We recommend that you have cloned and installed the lambo repository. Since there are some files we can’t install automatically using pip install git+..., we recommend that you create a conda environment for the lambo tasks:

# From the root of the poli repository
conda env create --file src/poli/objective_repository/foldx_rfp_lambo/environment.yml

Activate the environment you just created using

conda activate poli__lambo

lambo#

We also need lambo’s tasks to be available in Python’s path for poli__lambo:

# In the poli__lambo environment
git clone https://github.com/samuelstanton/lambo    # For reference, we use 431b052
cd lambo
pip install -e .  

In particular, we need

  • lambo.tasks.proxy_rfp.proxy_rfp.ProxyRFPTask

  • the rfp data: see ~/lambo/assets/fpbase

Make sure the data is avaliable.

How to run#

You can only run this objective function either in the poli__lambo environment, or as an isolated process (which runs this environment underneath).

After the setup described above, you can simply run the following code from

from pathlib import Path

import numpy as np

from poli import objective_factory

# How to create
f, x0, y0 = objective_factory.create(
    name="foldx_rfp_lambo",
)

# Example input:
print(x0)

# Querying:
print(y0)  # [[-11189.00587946    -39.8155    ], ...]

You could also pass an problem: ProblemSetupInformation to the create method. For the alphabet reference by default, we use this encoding.

How to cite#

If you use this black box, we expect you to cite the following resources:

[1] Stanton, Samuel, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, and Andrew Gordon Wilson. “Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders.” arXiv, July 12, 2022. http://arxiv.org/abs/2203.12742.

[2] Schymkowitz, Joost, Jesper Borg, Francois Stricher, Robby Nys, Frederic Rousseau, and Luis Serrano. “The FoldX Web Server: An Online Force Field.” Nucleic Acids Research 33, no. Web Server issue (July 1, 2005): W382–88. https://doi.org/10.1093/nar/gki387.

[3] González-Duque, M., Bartels, S., & Michael, R. (2024). poli: a libary of discrete sequence objectives [Computer software]. MachineLearningLifeScience/poli


@article{stanton:LaMBO:2022,
  title   = {Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders},
  author  = {Stanton, Samuel and Maddox, Wesley and Gruver, Nate and Maffettone, Phillip and Delaney, Emily and Greenside, Peyton and Wilson, Andrew Gordon},
  journal = {arXiv preprint arXiv:2203.12742},
  year    = {2022}
}

@article{Schymkowitz:foldx:2005, title={The FoldX web server: an online force field},
  volume={33},
  ISSN={0305-1048},
  DOI={10.1093/nar/gki387},
  journal={Nucleic Acids Research},
    author={Schymkowitz, Joost and Borg, Jesper and Stricher, Francois and Nys, Robby and Rousseau, Frederic and Serrano, Luis},
    year={2005},
    month=jul,
    pages={W382–W388}
}

@software{Gonzalez-Duque:poli:2024,
author = {González-Duque, Miguel and Bartels, Simon and Michael, Richard},
month = jan,
title = {{poli: a libary of discrete sequence objectives}},
url = {https://github.com/MachineLearningLifeScience/poli},
version = {0.0.1},
year = {2024}
}