4. Input Parameters

basic inputs and outputs

4.1. CALYPSO Inputs —— toml

Main input files, named as input.toml, which contains all necessary parameters for the structure prediction. This files consists of input tags that can be given in any order, or be omitted while the default values are used. below we offer a quick view of the syntax of the tags:

  1. the general syntax is consistence with toml , one can find more information about this format file here.

  2. the labels are case-insensitive.

  3. all text following the “#” character is taken as comment.

  4. logical values can be given as t (or true), or f (or false).

  5. null is allowed.

below are brief descriptions on necessary input parameters.

4.1.1. Common parameters in CALYPSO block

4.1.1.1. Systemname

SystemName = string

A description string of the targeted system(max. 40 characters).

Default: CALYPSO

4.1.1.2. Seed

seed = integer

Positive int number to set random seed for REPRODUCIBILITY, negative to do not set it.

Default: -1

4.1.1.3. IType

IType = int or string

Control the type of structures to be generated.

IType int IType string Module
1 CRYSTAL Crystal structure prediction
2 CLUSTER Cluster structure prediction
3 MOLECULAR Molecular crystal structure prediction
4 LAYER Layer (including film) structure prediction

One can use int or string to specify the type of structure prediction. But if string is used, it must be uppercase.

Default: 1

4.1.1.4. ICode

ICode = integer or string

Defines which code to be used for local structure optimization during the structure prediction.

1:

VASP

3:

GULP

4:

PWSCF

9:

LAMMPS

15:

MLP

Default: 1

4.1.1.5. IAlgo

IAlgo = integer or string

Defines which PSO algorithm to be adopted in the simulation.

1:

global PSO algorithm

2:

local PSO algorithm

3:

ABC algorithm with symmetry

Default: 2

4.1.1.6. IDisp

 IDisp = integer or string
1:

ORCH

The build-in task dispatcher by CALYPSO, other third party libraries will be implemented.

Default: 1

4.1.1.7. IFit

IFit = integer or string

Defining the fitness to determine the evolution structure of the population.

1:

ENTHALPY

2:

HARDNESS

3:

GIBBS

Default: 1

4.1.1.8. IRunner

IRunner = int

Define the style of running calypso.

1:

automatically run

2:

manually run each step (split mode)

Default: 1

4.1.1.9. ISim

ISim = int or string

Define the descriptor of structures, it will be used to determine whether two structures are similar.

0:

NAN

1:

BCM

2:

CCF

BCM is faster than CCF, so we suggest to use BCM for most cases. if encountering the similarity warning when generating structures, one should decrease the value of SimThreshold or turn off the similarity compare by setting ISim = 0.

Default: 1

4.1.1.10. BlockMode

BlockMode = bool

Define the evolution way.

true:

evolution will be performed after each generation is done.

false:

evolution will be performed once each structures local optimization is done.

Warning

Now we only support the blockmode = true.

Default: true

4.1.1.11. PickUp

PickUp = bool

Whether to pick up a calculation. Now CALYPSO support pickup in any stage, just turn this on.

Another interesting thing is that, pickup can not only pick up a aborted CALYPSO task, but also can “pick up” a finished CALYPSO task with a new changed MaxStep, which can allow you to keep the evolution information you don’t want to drop and continue to run.

true:

pickup the old calculation.

false:

restart a new calculation.

Default: false

4.1.2. Parameters for evolution in CALYPSO.EVO block

4.1.2.1. NBest

NBest = int

Defines how many parts the PES will be separated and PSO will move to the closest one to generate the next structure.

In global PSO, NBest is equal to 1.

Default: 4

4.1.2.2. PsoRatio

PsoRatio = float

Defines what percentage of the structures per generation should be produced by PSO.

The rest of structures will then be randomly generated with symmetry constraints.

Default: 0.6

4.1.2.3. SabcRatio

Sabcratio = list of float

Define the percentage of scouts, employees, and onlookers, in which:

  • scouts chooose a different space groups

  • onlookers choose a different combination of the wyckoff positions

  • employees choose different atomic coordinates of the wyckoff positions

Please make sure the sum of three float number should equal to 1.0.

Default: [0.3, 0.2, 0.5]

4.1.2.4. PopSize

PopSize = integer

The population size, i.e., the total number of structures per generation.

Normally, a larger population size is needed for a larger system. Very large population size should be used for simulations of automatic variation of chemical compositions.

Default: 10

4.1.2.5. MaxStep

MaxStep = integer

The maximum number of generations to be executed for the entire structure prediction simulation.

Typically, a larger number of generations are needed for a larger system.

Default: 2

4.1.2.6. Temperature

Temperature = 300

The temperature value when considering Gibbs free energy (IFit = 3). The algorithium can be found here.

The unit is Kelvin.

Default: 300

4.1.3. Parameters for generator in CALYPSO.GENERATOR block

4.1.3.1. basic parameters for each type of crystal structure prediction

4.1.3.2. FormulaUnit

FormulaUnit = list of string

For example, if we set FormulaUnit = ['(LiH4)1-2(NH3)3-4'], it means that we want to predict LiH4-NH3 structure, within the range of 1 to 2, and 3 to 4, respectively.

In Crystal Structure prediction, the length of FormulaUnit is 1. But for layer structure prediction, the length of FormulaUnit is equal to the number of layers.

There is no default. you must define it.

4.1.3.3. MaxNumAtom

MaxNumAtom = integer

The maximal number of atoms allowed in the simulation cell.

Default: 100

4.1.3.4. VolumeUnit

VolumeUnit = dict of string and int

Custom volume of each unit. Set 0 or leave empty means calculated by covalent radii (only available for single element), which is 1.3*(4/3)πr^3.

For example, VolumeUnit = {Li=10, H=10, N=10} mean volume of atom Li, H, and N are equal to 10.

Warning

The key of dict in toml is no need to add quote for string.

Default: {} <=> (1.3*(4/3)π(covalent radii)^3)

4.1.3.5. DistanceOfIon

DistanceOfIon = list or dict

Minimal inter atomic distances (in unit of angstrom) in a format of (n+1)x(n+1) matrix or in a format of dict.

for example, DistanceOfIon = [["X", "Li", "H", "N"], ["Li",  1.0, 1.0,  1.0], ["H",  1.0, 1.0,  1.0], ["N", 1.0, 1.0,  1.0],]] is equal to DistanceOfIon = {Li: 0.5, N: 0.5, H: 0.5}.

Default: {} <=> covalent radii

4.1.3.6. SpaceGroup

SpaceGroup = list of int and string

Defines the range of space groups to be considered.

The rule of specific space group is :

  1. one single integer means a single space group number

  2. “int1-int2” means space group number ranging from int1 to int2

  3. “int1:int2:int3” means space group number ranging from int1 to int2 with step size int3. [int1, int2)

Note

There are some differences when choosing different structure generating method.

  • crystal (IType = 1): SpaceGroup ranging from 1 to 230

  • cluster (IType = 2): SpaceGroup ranging from 1 to 31

  • molecular crystal (IType = 3): SpaceGroup ranging from 1 to 230

  • layer (IType = 4): SpaceGroup ranging from 1 to 17 for multi-layer, ranging from 1-230 for single layer.

Default: [1, “2-210”, “211:231:1”]

4.1.3.7. PrototypePath

PrototypePath = list of string

The provided path which containing the prototype structures (end with .vasp).

For example, PrototypePath = ["path/to/vasp/poscar"]. In the very begining, the code will parser the provided path and save them into ~/.cache/calypso/prototype naming as {number of atoms}.csv. And all the structures with same number of atoms will saved here.

There is no default value. You must supply this variable if you want to use it.

4.1.3.8. PrototypeRatio

PrototypeRatio = float

The ratio of prototype-base-generated structures in random-generated structures.

Default: 0.0

4.1.3.9. bulk detail parameters

4.1.3.10. LengthMaxRatio

LengthMaxRatio = float

The max ratio of the length of a, b, c.

Default: 5.0

4.1.3.11. LengthMinRatio

LengthMinRatio = float

The min ratio of the length of a, b, c.

Default: 1.0

4.1.3.12. Extra Parameters for layer structure prediction

4.1.3.13. Thicknesses

Thicknesses = list of float

The thicknesses of thin films (in unit of angstrom).

The length of Thicknesses is equal to the length of FormulaUnit

There is no default value. You must supply this variable if IType = 4.

4.1.3.14. Area

Area = float

The area (in unit of angstrom^2) per formula unit.

If you cannot provide a good estimation on the area, please use the default value. The program will automatically generate an estimated area by using the ionic radii of given atoms.

There is no default value. You must supply this variable if IType = 4.

4.1.3.15. Gaps

Gaps = list of float

The gap between two layers, i.e., the interlayer distance (in unit of angstrom). The length of Gaps should be equal to the length of FormulaUnit. And the last value of Gaps is always the vacancy value.

For example, the FormulaUnit = ["MoS2", "CrI3"], the the gap can be set as Gaps = [2, 10], which means the distance between two “MoS2” layer is 2 angstrom, and the vacancy is 10 angstrom.

There is no default value. You must supply this variable if IType = 4.

4.1.3.16. Extra Parameters for cluster structure prediction

4.1.3.17. Vacancy

Vacancy = list of float

The isolated cluster is placed into an orthorhombic box where the periodic boundary condition is applied.

This variable defines the separations (in unit of angstrom) between the studied cluster and its nearest-neighboring periodic images. It should be large enough to ensure that interactions between the studied cluster and its nearest-neighboring images are negligible.

For cluster structure prediction, we do not recommend the use of VASP for the structure optimization for large systems since computationally VASP calculations are very expensive.

Default: [10.0 10.0 10.0]

4.1.3.18. cluster_type

ClusterType = string 
normal:

the core-shell type cluster

cage:

the cage cluster

plane:

the plane cluster

Default: normal

4.1.3.19. Extra Parameters for molecule structure prediction

4.1.3.20. MoleculesPath

MoleculesPath = dict of string

The path of molecules. And the molecular name in FormulaUnit will be parsered by this key.

For example, if we have FormulaUnit = ["{Water}4"], then MoleculesPath = {'Water'='./H2O.xyz'}, so that Water will be parserd as H2O.

Default: {}

4.1.4. Parameters for optimization in CALYPSO.OPT block

4.1.4.1. DFTInputPath

DFTInputPath = string

The Path that contains the input files for the DFT code.

If one using MLP with model file, it also should be saved in here.

Default: “./”

4.1.4.2. JobFlow

JobFlow = list of string

Define the sequence of calculation to be conducted. The number of input files should also be equal to the length of JobFlow.

default: [“opt”, “opt”, “opt”]

4.1.4.3. PpMap

PpMap = list of string

Define the path of pseudopotential files and their corresponding element mapping. Only work for VASP for now.

For example, PpMap = {Li: "POTCAR_Li", Mg: "mmm"}

There is no default value. One must set it manually.

4.1.4.4. ShareFiles

ShareFiles = list of string

the absolute path of model of other files need to be copied into the real calculation directory.

For example , if one using VASP as calculator and considering vdw functional which definitely needs the vdw_kernel.bindat file, one can put the path of kernel file in ShareFiles to make sure the kernel will be used in each structure optimization.

Another example is that one can put the path of model here if using mlp as calculator.

Default: []

4.1.4.5. Extra Parameters MLP calculator

4.1.4.6. MLPType

MLPType = "dp"
dp:

deep potential

deepmd:

deep potential

dpa:

deep potential

dpa2:

deep potential

m3gnet:

chgnet:

mace_mp:

mace_off:

gulp:

emt:

lj:

morse:

Choose which type of mlp will be used.

There is no default value. One must set it manually.

4.1.4.7. MLPParams

MLPParams = {"model"="M3GNet-MP-2021.2.8-PES"}

The parameters of mlp initialization. chgnet: {“model”=”0.3.0”, “check_cuda_mem”=true, “on_isolated_atoms”=”warn”} dp: {“model”: “path/to/model”}

Default: {}

4.1.4.8. OptAlgo

OptAlgo = string

The algorithm of optimization.

LBFGS:

FIRE:

BFGS:

Default: “LBFGS”

4.1.4.9. OptStep

OptStep = int

The number of step of optimization.

Default: 1000

4.1.4.10. TrajFile

TrajFile = string

The filename of optimization trajectory.

Default: traj.traj

4.1.4.11. Pstress

Pstress = float

The pressure of when conducting mlp structure optimization. in GPa

Default: 0.0

4.1.4.12. Fmax

Fmax = float

The converage condition. The optimization will stop when all the force of each atom is smaller than Fmax.

Default: 0.1

4.1.4.13. MLPKeepSym

MKPKeepSym = bool

Whether to keep symmetry when using mlp to conducting optimization.

Default: false

4.1.5. Parameters for dispatcher in CALYPSO.DISPATCHER block

4.1.5.1. MachineList

MachineList = list of string

These parameters define the available computational resources. For example, you are using the cluster with two queues that can be used, then can choose to set up at most two machine.json to perform structure optimization, in very parallel way.

MachineList = ["./machine-1.json", "/machine-2.json"]

There is no default value for MachineList. One must set it manually.

4.1.5.2. TimeInterval

TimeInterval = int

How often the dispatcher will check the status of the jobs.

Default: 10

4.1.5.3. TmpPath

TmpPath = string

The path to save the log file of Orchestrator (dispatcher).

Default: “BackStage”

4.1.6. Parameters for descriptor in CALYPSO.DESCRIPTOR block

4.1.6.1. SimThreshold

SimThreshold = float

Define the threshold of similarity between two structures. If the distance of two structures is less than the threshold, they are considered as the same structure.

Default: 0.01

4.2. CALYPSO Outputs

All the major output files are listed in the folder of “results”:

File Name Description
Analysis_Output.csv The results file of the predicted structures.
database.db Contains the intermediate parameters of CALYPSO.
descriptor.pkl Includes the information of the descriptor of each structures.
ini.json Includes the initial structures information.
opt.json Includes the optimized structures information and the corresponding energy, force and so on.
opt_task All structures optimization are saved in this folder.

4.3. Analysis of Results

CALYPSO calculations typically generate a large number of structures. It is necessary to devise a versatile tool for data analyses.

Here we develop CALYPSO ANALYSIS KIT (CAK) allowing automatic structure analysis.

When you have installed calypso, pycak is available from command line.

> cd path-to-calculation/results
> pycak --help
usage: pycak [-h] [-d DIR] [--refene REFENE_FILE] [-m TOL [TOL ...]] [-a] [--reduce-sim] [--energy-threshold ENERGY_THRESHOLD] [--pcell] [--ucell] [--vasp]
             [--synth] [--synth-model-dir SYNTHESISABILITY_MODEL_DIR]

CALYPSO Analysis Toolkits
-------------------------
Analysis CALYPSO results

Examples:
    pycak -m 0.1 0.01 a --ucell --vasp

Optional: analysis synthesisability
    This requires pytorch etc. being installed. See detailed instruction in
    <https://iccms-calypso.github.io/CALYPSO-Python/posts/_installation.html#installation>
Then download and decompress the model archive into the default cache directory:
    MODEL_ARCHIVE_URL=https://github.com/ICCMS-CALYPSO/open-resources/releases/download/CALYPSO-v10.0.0-alpha.1/synth-ckpt-v1.0.0.tar.gz
    PROJECT_CACHEDIR=/Users/wangzhenyu/.cache/calypso
    curl -L $MODEL_ARCHIVE_URL | tar -C $PROJECT_CACHEDIR -zxf -

options:
  -h, --help            show this help message and exit
  -d DIR, --results-dir DIR
                        path to the results directory (default: .)
  --refene REFENE_FILE  reference energy (enthalpy) for energy above hull (default: ../refene.txt)
  -m TOL [TOL ...], --multi-tolerance TOL [TOL ...]
                        tolerances for analysising symmetry;
                        multiple values are acceptable; some useful
                        values: 1.0, 0.5, 0.1, 0.01, 0.001; (default: 0.1)
  -a, --all             analysis all structures; by default only the
                        50 lowest energy structures are considered
  --reduce-sim          reduce similarity using energy threshold
  --energy-threshold ENERGY_THRESHOLD
                        energy threshold (eV) of reducing similarity; below which
                        two structures are considered duplicates (default: 1e-3)

output format:
  --pcell               write primcell cell
  --ucell               write unit cell; If neither pcell nor ucell are specified,
                        ucell is switched on
  --vasp                write structure in vasp format

analysis synthesisability:
  --synth               whether to analyse synthesisability with machine learning model
  --synth-model-dir SYNTHESISABILITY_MODEL_DIR
                        directory to model parameters for synthesisability model
                        (default: /Users/wangzhenyu/.cache/calypso/synth-ckpt-v1.0.0)

> pycak

An output file named as “Analysis_Output.dat” will be generated.

> cat Analysis_Output.dat
 idx   caly_name      formula       enth_per_atom      fitness      volume_per_atom   density   min_dis  spg(0.1)  spgnum(0.1) natom(0.1)
  0    caly_1         Li1H7N1          -3.024          -27.217           9.507         0.543     1.053  P3m1          156         9     
  1    caly_0         Li2H11N1         -3.139          -43.948           7.158         0.646     0.749  C2             5          28   

4.4. Orchestrator —— CALYPSO task dispatcher

To make CALYPSO more flexible, we develop a task dispatcher to help users to submit CALYPSO jobs in more ways.

Orchestrator mainly depends on an input file: machine.json, which defines how to reach the computational resources, and how to run calculation in these resources.

Here is the parameters of machine.json:

4.4.1. common parameters

4.4.1.1. name

name = string

Name of this computational resources, useful when you have multi-computational resources.

Default: “Machine”