Extrapolation grade and active learning#

For any fitted ACE potential and corresponding training set (usually stored by pacemaker into fitting_data_info.pckl.gzip file in working directory) one can generate corresponding active set for linear B-projections (default) of full non-linear embedding. Practice shows that linear active set is enough for extrapolation grade estimation. However, if you want more sensitive (and "over-secure") extrapolation grade, then full active set could be used.

Active set generation#

Utility to generate active set (used for extrapolation grade calculation).

usage: pace_activeset [-h] [-d DATASET] [-f] [-b BATCH_SIZE] [-g GAMMA_TOLERANCE] [-i MAXVOL_ITERS] [-r MAXVOL_REFINEMENT] [-m MEMORY_LIMIT] potential_file

Utility to compute active set for PACE (.yaml) potential

positional arguments:
potential_file        B-basis file name (.yaml)

optional arguments:
   -h, --help            show this help message and exit
   -d DATASET, --dataset DATASET
   Dataset file name, ex.: filename.pckl.gzip
   -f, --full            Compute active set on full (linearized) design matrix
   -b BATCH_SIZE, --batch_size BATCH_SIZE
   Batch size (number of structures) considered simultaneously.If not provided - all dataset at once is considered
   -g GAMMA_TOLERANCE, --gamma_tolerance GAMMA_TOLERANCE
   Gamma tolerance
   -i MAXVOL_ITERS, --maxvol_iters MAXVOL_ITERS
   Number of maximum iteration in MaxVol algorithm
   -r MAXVOL_REFINEMENT, --maxvol_refinement MAXVOL_REFINEMENT
   Number of refinements (epochs)
   -m MEMORY_LIMIT, --memory-limit MEMORY_LIMIT
   Memory limit (i.e. 1GB, 500MB or 'auto')

Example of usage:

pace_activeset -d fitting_data_info.pckl.gzip output_potential.yaml

that will generate linear active set and store it into output_potential.asi file.

or

pace_activeset -d fitting_data_info.pckl.gzip output_potential.yaml -f

that will generate full active set (including linearized part of non-linear embedding function) and store it into output_potential.asi.nonlinear file.

Usage of active set with LAMMPS#

Example of usage of active set with LAMMPS

pair_style  pace/extrapolation
pair_coeff  * * output_potential.yaml output_potential.asi Al Cu

# compute per-atom extrapolation grade every 10 steps
fix pace_gamma all pair 10 pace/extrapolation gamma 1
# compute maximum extrapolation grade over complete structure
compute max_pace_gamma all reduce max f_pace_gamma

# dump extrapolative structures if c_max_pace_gamma > 5, skip otherwise, check every 20 steps 
variable dump_skip equal "c_max_pace_gamma < 5"
dump pace_dump all custom 20 extrapolative_structures.dump id type x y z f_pace_gamma
dump_modify pace_dump skip v_dump_skip

# stop simulation if maximum extrapolation grade exceeds 25
variable max_pace_gamma equal c_max_pace_gamma
fix extreme_extrapolation all halt 10 v_max_pace_gamma > 25

Check LAMMPS documentation for more details and example

With this setup you can run LAMMPS simulations and make use of per-atom extrapolation grade f_pace_gamma fix variable (i.e. in regular dump and visualization) or per-structure c_max_pace_gamma maximum extrapolation grade in thermo_style.

Two main scenarios:

Exploring new structures (and dump extrapolative structures with dump pace_dump). In that case extrapolative structures will be stored into extrapolative_structures.dump file, that could be loaded (i.e. with ASE) and DFT calculations could be performed with the tools of your choice. New DFT data can be added to training set and potential can be re-fitted (or up-fitted). This scenario is called active learning.
Performing normal simulations, observing extrapolation grade (printing c_max_pace_gamma variable) and stopping at extreme_extrapolation (with fix halt)

Usage of active set with ASE calculator#

from pyace import *

calc = PyACECalculator("output_potential.yaml")
calc.set_active_set("output_potential.asi")

# set calculator to ASE atoms
atoms.set_calculator(calc)

# trigger calculation
atoms.get_potential_energy()

#per-atom extrapolation grades are stored in
calc.results["gamma"]

Structure selection with D-optimality#

Once you have list of extrapolative structures (i.e. extrapolative_structures.dump from LAMMPS), you can select optimal subset of them using D-optimality criterion. For doing this, use pace_select command:

pace_select -p potential.yaml -a potential.asi -e "Ni Nb" -m 100 extrapolative_structures_1.dump extrapolative_structures_2.dump

that will load list of candidates structures from extrapolative_structures_1.dump and extrapolative_structures_2.dump files.

Option -e "Ni Nb" specifies mapping of LAMMPS atoms types 1 and 2 to "Ni" and "Nb" (this should correspond to species order in pair_coeff * * ... Ni Nb). NOTE! List of elements should be in quotes.

If no -a ... option is provided, then ONLY atomic environments from list of candidates structures in extrapolative_structures_*.dump will be considered for structure selection. If option -a potential.asi is included, then previously selected atomic environments (that form active set) will be added into list of all atomic environments before performing MaxVol algorithm.

Finally, all structures will be sorted by number of selected atomic environments in it and top 100 (specified by option -m ...) structures will be selected and stored into selected.pkl.gz.

NOTE! If list of candidates structures in is too big, it is better to split it by batches adding -b NUMBER_OF_STRUCTURES_PER_BATCH option.