SELFIES Examples

import sys

import selfies as sf
from rdkit import Chem

Standard Usage

First let’s try translating from SMILES to SELFIES, and then from SELFIES to SMILES. We will use a non-fullerene acceptor for organic solar cells as an example.

smiles = "CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)" \
encoded_selfies = sf.encoder(smiles)  # SMILES --> SEFLIES
decoded_smiles = sf.decoder(encoded_selfies)  # SELFIES --> SMILES

print(f"Original SMILES: {smiles}")
print(f"Translated SELFIES: {encoded_selfies}")
print(f"Translated SMILES: {decoded_smiles}")
Original SMILES: CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1
Translated SELFIES: [C][N][C][Branch1_2][C][=O][C][=C][Branch2_1][Ring2][Branch1_3][C][=C][C][=C][Branch1_1][Ring2][S][Ring1][Branch1_1][C][S][C][Branch1_1][N][C][=N][C][=C][Branch1_1][Ring1][C][#N][S][Ring1][Branch1_3][=C][C][Expl=Ring1][N][C][Ring1][S][O][C][C][O][Ring1][Branch1_1][N][Branch1_1][C][C][C][Branch1_2][C][=O][C][Ring2][Ring1][=N][=C][Ring2][Ring1][P][C][=C][C][=C][Branch1_1][Ring2][S][Ring1][Branch1_1][C][S][C][Branch1_1][N][C][=N][C][=C][Branch1_1][Ring1][C][#N][S][Ring1][Branch1_3][=C][C][Expl=Ring1][N][C][Ring1][S][O][C][C][O][Ring1][Branch1_1]
Translated SMILES: CN7C(=O)C6=C(C1=CC4=C(S1)C=3SC(C2=NC=C(C#N)S2)=CC=3C45OCCO5)N(C)C(=O)C6=C7C8=CC%11=C(S8)C=%10SC(C9=NC=C(C#N)S9)=CC=%10C%11%12OCCO%12

When comparing the original and decoded SMILES, do not use == equality. Use RDKit to check whether both SMILES correspond to the same molecule.

print(f"== Equals: {smiles == decoded_smiles}")

# Recomended
can_smiles = Chem.CanonSmiles(smiles)
can_decoded_smiles = Chem.CanonSmiles(decoded_smiles)
print(f"RDKit Equals: {can_smiles == can_decoded_smiles}")
== Equals: False
RDKit Equals: True

Advanced Usage

Now let’s try to customize the SELFIES constraints. We will first look at the default SELFIES semantic constraints.

default_constraints = sf.get_semantic_constraints()
print(f"Default Constraints:\n {default_constraints}")
Default Constraints:
 {'H': 1, 'F': 1, 'Cl': 1, 'Br': 1, 'I': 1, 'O': 2, 'O+1': 3, 'O-1': 1, 'N': 3, 'N+1': 4, 'N-1': 2, 'C': 4, 'C+1': 5, 'C-1': 3, 'P': 5, 'P+1': 6, 'P-1': 4, 'S': 6, 'S+1': 7, 'S-1': 5, '?': 8}

We have two compounds here, CS=CC#S and [Li]=CC in SELFIES form. Under the default SELFIES settings, they are translated like so. Note that since Li is not recognized by SELFIES, it is constrained to 8 bonds by default.

c_s_compound = sf.encoder("CS=CC#S")
li_compound = sf.encoder("[Li]=CC")

print(f"CS=CC#S --> {sf.decoder(c_s_compound)}")
print(f"[Li]=CC --> {sf.decoder(li_compound)}")
[Li]=CC --> [Li]=CC

We can add Li to the SELFIES constraints, and restrict it to 1 bond only. We can also restrict S to 2 bonds (instead of its default 6). After setting the new constraints, we can check to see if they were updated.

new_constraints = default_constraints
new_constraints['Li'] = 1
new_constraints['S'] = 2

sf.set_semantic_constraints(new_constraints)  # update constraints

print(f"Updated Constraints:\n {sf.get_semantic_constraints()}")
Updated Constraints:
 {'H': 1, 'F': 1, 'Cl': 1, 'Br': 1, 'I': 1, 'O': 2, 'O+1': 3, 'O-1': 1, 'N': 3, 'N+1': 4, 'N-1': 2, 'C': 4, 'C+1': 5, 'C-1': 3, 'P': 5, 'P+1': 6, 'P-1': 4, 'S': 2, 'S+1': 7, 'S-1': 5, '?': 8, 'Li': 1}

Under our new settings, our previous molecules are translated like so. Notice that our new semantic constraints are met.

print(f"CS=CC#S --> {sf.decoder(c_s_compound)}")
print(f"[Li]=CC --> {sf.decoder(li_compound)}")
[Li]=CC --> [Li]CC

To revert back to the default constraints, simply call:
