4. Input Parameters
basic inputs and outputs
4.1. CALYPSO Inputs —— toml
Main input files, named as input.toml
, which contains all necessary parameters for the
structure prediction. This files consists of input tags that can be given in any order,
or be omitted while the default values are used. below we offer a quick view of the
syntax of the tags:
the general syntax is consistence with
toml
, one can find more information about this format file here.the labels are case-insensitive.
all text following the “#” character is taken as comment.
logical values can be given as t (or true), or f (or false).
null is allowed.
below are brief descriptions on necessary input parameters.
4.1.1. Common parameters in CALYPSO
block
4.1.1.1. Systemname
SystemName = string
A description string of the targeted system(max. 40 characters).
Default: CALYPSO
4.1.1.2. Seed
seed = integer
Positive int number to set random seed for REPRODUCIBILITY, negative to do not set it.
Default: -1
4.1.1.3. IType
IType = int or string
Control the type of structures to be generated.
IType int | IType string | Module |
---|---|---|
1 | CRYSTAL |
Crystal structure prediction |
2 | CLUSTER |
Cluster structure prediction |
3 | MOLECULAR |
Molecular crystal structure prediction |
4 | LAYER |
Layer (including film) structure prediction |
One can use int or string to specify the type of structure prediction. But if string is used, it must be uppercase.
Default: 1
4.1.1.4. ICode
ICode = integer or string
Defines which code to be used for local structure optimization during the structure prediction.
- 1:
VASP
- 3:
GULP
- 4:
PWSCF
- 9:
LAMMPS
- 15:
MLP
Default: 1
4.1.1.5. IAlgo
IAlgo = integer or string
Defines which PSO algorithm to be adopted in the simulation.
- 1:
global PSO algorithm
- 2:
local PSO algorithm
- 3:
ABC algorithm with symmetry
Default: 2
4.1.1.6. IDisp
IDisp = integer or string
- 1:
ORCH
The build-in task dispatcher by CALYPSO, other third party libraries will be implemented.
Default: 1
4.1.1.7. IFit
IFit = integer or string
Defining the fitness to determine the evolution structure of the population.
- 1:
ENTHALPY
- 2:
HARDNESS
- 3:
GIBBS
Default: 1
4.1.1.8. IRunner
IRunner = int
Define the style of running calypso.
- 1:
automatically run
- 2:
manually run each step (split mode)
Default: 1
4.1.1.9. ISim
ISim = int or string
Define the descriptor of structures, it will be used to determine whether two structures are similar.
- 0:
NAN
- 1:
BCM
- 2:
CCF
BCM is faster than CCF, so we suggest to use BCM for most cases.
if encountering the similarity warning when generating structures, one should decrease the value of SimThreshold
or turn off the similarity compare by setting ISim = 0
.
Default: 1
4.1.1.10. BlockMode
BlockMode = bool
Define the evolution way.
- true:
evolution will be performed after each generation is done.
- false:
evolution will be performed once each structures local optimization is done.
Warning
Now we only support the blockmode = true.
Default: true
4.1.1.11. PickUp
PickUp = bool
Whether to pick up a calculation. Now CALYPSO support pickup in any stage, just turn this on.
Another interesting thing is that, pickup
can not only pick up a aborted CALYPSO task, but also can “pick up” a finished CALYPSO task with a new changed MaxStep
, which can allow you to keep the evolution information you don’t want to drop and continue to run.
- true:
pickup the old calculation.
- false:
restart a new calculation.
Default: false
4.1.2. Parameters for evolution in CALYPSO.EVO
block
4.1.2.1. NBest
NBest = int
Defines how many parts the PES will be separated and PSO will move to the closest one to generate the next structure.
In global PSO, NBest
is equal to 1.
Default: 4
4.1.2.2. PsoRatio
PsoRatio = float
Defines what percentage of the structures per generation should be produced by PSO.
The rest of structures will then be randomly generated with symmetry constraints.
Default: 0.6
4.1.2.3. SabcRatio
Sabcratio = list of float
Define the percentage of scouts, employees, and onlookers, in which:
scouts chooose a different space groups
onlookers choose a different combination of the wyckoff positions
employees choose different atomic coordinates of the wyckoff positions
Please make sure the sum of three float number should equal to 1.0.
Default: [0.3, 0.2, 0.5]
4.1.2.4. PopSize
PopSize = integer
The population size, i.e., the total number of structures per generation.
Normally, a larger population size is needed for a larger system. Very large population size should be used for simulations of automatic variation of chemical compositions.
Default: 10
4.1.2.5. MaxStep
MaxStep = integer
The maximum number of generations to be executed for the entire structure prediction simulation.
Typically, a larger number of generations are needed for a larger system.
Default: 2
4.1.2.6. Temperature
Temperature = 300
The temperature value when considering Gibbs free energy (IFit = 3
). The algorithium can be found here.
The unit is Kelvin.
Default: 300
4.1.3. Parameters for generator in CALYPSO.GENERATOR
block
4.1.3.1. basic parameters for each type of crystal structure prediction
4.1.3.2. FormulaUnit
FormulaUnit = list of string
For example, if we set FormulaUnit = ['(LiH4)1-2(NH3)3-4']
, it means that we want to predict LiH4-NH3 structure, within the range of 1 to 2, and 3 to 4, respectively.
In Crystal Structure prediction, the length of FormulaUnit is 1. But for layer structure prediction, the length of FormulaUnit is equal to the number of layers.
There is no default. you must define it.
4.1.3.3. MaxNumAtom
MaxNumAtom = integer
The maximal number of atoms allowed in the simulation cell.
Default: 100
4.1.3.4. VolumeUnit
VolumeUnit = dict of string and int
Custom volume of each unit. Set 0 or leave empty means calculated by covalent radii (only available for single element), which is 1.3*(4/3)πr^3.
For example, VolumeUnit = {Li=10, H=10, N=10}
mean volume of atom Li, H, and N are equal to 10.
Warning
The key of dict in toml is no need to add quote for string.
Default: {} <=> (1.3*(4/3)π(covalent radii)^3)
4.1.3.5. DistanceOfIon
DistanceOfIon = list or dict
Minimal inter atomic distances (in unit of angstrom) in a format of (n+1)x(n+1) matrix or in a format of dict.
for example, DistanceOfIon = [["X", "Li", "H", "N"], ["Li", 1.0, 1.0, 1.0], ["H", 1.0, 1.0, 1.0], ["N", 1.0, 1.0, 1.0],]]
is equal to DistanceOfIon = {Li: 0.5, N: 0.5, H: 0.5}
.
Default: {} <=> covalent radii
4.1.3.6. SpaceGroup
SpaceGroup = list of int and string
Defines the range of space groups to be considered.
The rule of specific space group is :
one single integer means a single space group number
“int1-int2” means space group number ranging from int1 to int2
“int1:int2:int3” means space group number ranging from int1 to int2 with step size int3. [int1, int2)
Note
There are some differences when choosing different structure generating method.
crystal (
IType = 1
):SpaceGroup
ranging from 1 to 230cluster (
IType = 2
):SpaceGroup
ranging from 1 to 31molecular crystal (
IType = 3
):SpaceGroup
ranging from 1 to 230layer (
IType = 4
):SpaceGroup
ranging from 1 to 17 for multi-layer, ranging from 1-230 for single layer.
Default: [1, “2-210”, “211:231:1”]
4.1.3.7. PrototypePath
PrototypePath = list of string
The provided path which containing the prototype structures (end with .vasp).
For example, PrototypePath = ["path/to/vasp/poscar"]
. In the very begining, the code will parser the provided path and save them into ~/.cache/calypso/prototype
naming as {number of atoms}.csv
. And all the structures with same number of atoms will saved here.
There is no default value. You must supply this variable if you want to use it.
4.1.3.8. PrototypeRatio
PrototypeRatio = float
The ratio of prototype-base-generated structures in random-generated structures.
Default: 0.0
4.1.3.9. bulk detail parameters
4.1.3.10. LengthMaxRatio
LengthMaxRatio = float
The max ratio of the length of a, b, c.
Default: 5.0
4.1.3.11. LengthMinRatio
LengthMinRatio = float
The min ratio of the length of a, b, c.
Default: 1.0
4.1.3.12. Extra Parameters for layer structure prediction
4.1.3.13. Thicknesses
Thicknesses = list of float
The thicknesses of thin films (in unit of angstrom).
The length of Thicknesses
is equal to the length of FormulaUnit
There is no default value. You must supply this variable if IType = 4
.
4.1.3.14. Area
Area = float
The area (in unit of angstrom^2) per formula unit.
If you cannot provide a good estimation on the area, please use the default value. The program will automatically generate an estimated area by using the ionic radii of given atoms.
There is no default value. You must supply this variable if IType = 4
.
4.1.3.15. Gaps
Gaps = list of float
The gap between two layers, i.e., the interlayer distance (in unit of angstrom). The length of Gaps should be equal to the length of FormulaUnit
. And the last value of Gaps
is always the vacancy value.
For example, the FormulaUnit = ["MoS2", "CrI3"]
, the the gap can be set as Gaps = [2, 10]
, which means the distance between two “MoS2” layer is 2 angstrom, and the vacancy is 10 angstrom.
There is no default value. You must supply this variable if IType = 4
.
4.1.3.16. Extra Parameters for cluster structure prediction
4.1.3.17. Vacancy
Vacancy = list of float
The isolated cluster is placed into an orthorhombic box where the periodic boundary condition is applied.
This variable defines the separations (in unit of angstrom) between the studied cluster and its nearest-neighboring periodic images. It should be large enough to ensure that interactions between the studied cluster and its nearest-neighboring images are negligible.
For cluster structure prediction, we do not recommend the use of VASP for the structure optimization for large systems since computationally VASP calculations are very expensive.
Default: [10.0 10.0 10.0]
4.1.3.18. cluster_type
ClusterType = string
- normal:
the core-shell type cluster
- cage:
the cage cluster
- plane:
the plane cluster
Default: normal
4.1.3.19. Extra Parameters for molecule structure prediction
4.1.3.20. MoleculesPath
MoleculesPath = dict of string
The path of molecules. And the molecular name in FormulaUnit
will be parsered by this key.
For example, if we have FormulaUnit = ["{Water}4"]
, then MoleculesPath = {'Water'='./H2O.xyz'}
, so that Water will be parserd as H2O.
Default: {}
4.1.4. Parameters for optimization in CALYPSO.OPT
block
4.1.4.1. DFTInputPath
DFTInputPath = string
The Path that contains the input files for the DFT code.
If one using MLP with model file, it also should be saved in here.
Default: “./”
4.1.4.2. JobFlow
JobFlow = list of string
Define the sequence of calculation to be conducted. The number of input files should also be equal to the length of JobFlow.
default: [“opt”, “opt”, “opt”]
4.1.4.3. PpMap
PpMap = list of string
Define the path of pseudopotential files and their corresponding element mapping. Only work for VASP for now.
For example, PpMap = {Li: "POTCAR_Li", Mg: "mmm"}
There is no default value. One must set it manually.
4.1.4.5. Extra Parameters MLP calculator
4.1.4.6. MLPType
MLPType = "dp"
- dp:
deep potential
- deepmd:
deep potential
- dpa:
deep potential
- dpa2:
deep potential
- m3gnet:
- chgnet:
- mace_mp:
- mace_off:
- gulp:
- emt:
- lj:
- morse:
Choose which type of mlp will be used.
There is no default value. One must set it manually.
4.1.4.7. MLPParams
MLPParams = {"model"="M3GNet-MP-2021.2.8-PES"}
The parameters of mlp initialization. chgnet: {“model”=”0.3.0”, “check_cuda_mem”=true, “on_isolated_atoms”=”warn”} dp: {“model”: “path/to/model”}
Default: {}
4.1.4.8. OptAlgo
OptAlgo = string
The algorithm of optimization.
- LBFGS:
- FIRE:
- BFGS:
Default: “LBFGS”
4.1.4.9. OptStep
OptStep = int
The number of step of optimization.
Default: 1000
4.1.4.10. TrajFile
TrajFile = string
The filename of optimization trajectory.
Default: traj.traj
4.1.4.11. Pstress
Pstress = float
The pressure of when conducting mlp structure optimization. in GPa
Default: 0.0
4.1.4.12. Fmax
Fmax = float
The converage condition. The optimization will stop when all the force of each atom is smaller than Fmax.
Default: 0.1
4.1.4.13. MLPKeepSym
MKPKeepSym = bool
Whether to keep symmetry when using mlp to conducting optimization.
Default: false
4.1.5. Parameters for dispatcher in CALYPSO.DISPATCHER
block
4.1.5.1. MachineList
MachineList = list of string
These parameters define the available computational resources. For example, you are using the cluster with two queues that can be used, then can choose to set up at most two machine.json
to perform structure optimization, in very parallel way.
MachineList = ["./machine-1.json", "/machine-2.json"]
There is no default value for MachineList
. One must set it manually.
4.1.5.2. TimeInterval
TimeInterval = int
How often the dispatcher will check the status of the jobs.
Default: 10
4.1.5.3. TmpPath
TmpPath = string
The path to save the log file of Orchestrator (dispatcher).
Default: “BackStage”
4.1.6. Parameters for descriptor in CALYPSO.DESCRIPTOR
block
4.1.6.1. SimThreshold
SimThreshold = float
Define the threshold of similarity between two structures. If the distance of two structures is less than the threshold, they are considered as the same structure.
Default: 0.01
4.2. CALYPSO Outputs
All the major output files are listed in the folder of “results”:
File Name | Description |
---|---|
Analysis_Output.csv |
The results file of the predicted structures. |
database.db |
Contains the intermediate parameters of CALYPSO. |
descriptor.pkl |
Includes the information of the descriptor of each structures. |
ini.json |
Includes the initial structures information. |
opt.json |
Includes the optimized structures information and the corresponding energy, force and so on. |
opt_task |
All structures optimization are saved in this folder. |
4.3. Analysis of Results
CALYPSO calculations typically generate a large number of structures. It is necessary to devise a versatile tool for data analyses.
Here we develop CALYPSO ANALYSIS KIT (CAK) allowing automatic structure analysis.
When you have installed calypso, pycak
is available from command line.
> cd path-to-calculation/results
> pycak --help
usage: pycak [-h] [-d DIR] [--refene REFENE_FILE] [-m TOL [TOL ...]] [-a] [--reduce-sim] [--energy-threshold ENERGY_THRESHOLD] [--pcell] [--ucell] [--vasp]
[--synth] [--synth-model-dir SYNTHESISABILITY_MODEL_DIR]
CALYPSO Analysis Toolkits
-------------------------
Analysis CALYPSO results
Examples:
pycak -m 0.1 0.01 a --ucell --vasp
Optional: analysis synthesisability
This requires pytorch etc. being installed. See detailed instruction in
<https://iccms-calypso.github.io/CALYPSO-Python/posts/_installation.html#installation>
Then download and decompress the model archive into the default cache directory:
MODEL_ARCHIVE_URL=https://github.com/ICCMS-CALYPSO/open-resources/releases/download/CALYPSO-v10.0.0-alpha.1/synth-ckpt-v1.0.0.tar.gz
PROJECT_CACHEDIR=/Users/wangzhenyu/.cache/calypso
curl -L $MODEL_ARCHIVE_URL | tar -C $PROJECT_CACHEDIR -zxf -
options:
-h, --help show this help message and exit
-d DIR, --results-dir DIR
path to the results directory (default: .)
--refene REFENE_FILE reference energy (enthalpy) for energy above hull (default: ../refene.txt)
-m TOL [TOL ...], --multi-tolerance TOL [TOL ...]
tolerances for analysising symmetry;
multiple values are acceptable; some useful
values: 1.0, 0.5, 0.1, 0.01, 0.001; (default: 0.1)
-a, --all analysis all structures; by default only the
50 lowest energy structures are considered
--reduce-sim reduce similarity using energy threshold
--energy-threshold ENERGY_THRESHOLD
energy threshold (eV) of reducing similarity; below which
two structures are considered duplicates (default: 1e-3)
output format:
--pcell write primcell cell
--ucell write unit cell; If neither pcell nor ucell are specified,
ucell is switched on
--vasp write structure in vasp format
analysis synthesisability:
--synth whether to analyse synthesisability with machine learning model
--synth-model-dir SYNTHESISABILITY_MODEL_DIR
directory to model parameters for synthesisability model
(default: /Users/wangzhenyu/.cache/calypso/synth-ckpt-v1.0.0)
> pycak
An output file named as “Analysis_Output.dat” will be generated.
> cat Analysis_Output.dat
idx caly_name formula enth_per_atom fitness volume_per_atom density min_dis spg(0.1) spgnum(0.1) natom(0.1)
0 caly_1 Li1H7N1 -3.024 -27.217 9.507 0.543 1.053 P3m1 156 9
1 caly_0 Li2H11N1 -3.139 -43.948 7.158 0.646 0.749 C2 5 28
4.4. Orchestrator —— CALYPSO task dispatcher
To make CALYPSO more flexible, we develop a task dispatcher to help users to submit CALYPSO jobs in more ways.
Orchestrator mainly depends on an input file: machine.json
, which defines how to reach the computational resources, and how to run calculation in these resources.
Here is the parameters of machine.json
:
4.4.1. common parameters
4.4.1.1. name
name = string
Name of this computational resources, useful when you have multi-computational resources.
Default: “Machine”