API Reference#

Top-level package#

atomRDF — ontology-based knowledge graphs for atomistic simulation data.

atomRDF combines pyscal3, ASE and RDFLib with the OCDO Conceptual Dictionary ontologies (CMSO, CDCO, PODO, PLDO, LDO, ASMO) to make atomic-scale samples and the simulation workflows that produce them queryable as RDF.

The public entry points re-exported here form the stable v1 API:

KnowledgeGraph – main container; create samples, add provenance, run SPARQL, persist to disk.
WorkflowParser – ingest external workflow descriptions and add them to a KnowledgeGraph.

For a quick tour see examples/01_getting_started.ipynb or the online documentation at https://atomrdf.pyscal.org.

class atomrdf.KnowledgeGraph(graph_file=None, store='Memory', store_file=None, identifier='http://default_graph', ontology=None, structure_store=None, enable_log=False)[source]#

Represents a knowledge graph.

Parameters:

graph_file (str, optional) – The path to the graph file to be parsed. Default is None.
store (str, optional) – The type of store to use. Default is “Memory”.
store_file (str, optional) – The path to the store file. Default is None.
identifier (str, optional) – The identifier for the graph. Default is “http://default_graph”.
ontology (Ontology, optional) – The ontology object to be used. Default is None.
structure_store (StructureStore, optional) – The structure store object to be used. Default is None.
enable_log (bool, optional) – Whether to enable logging. Default is False. If true, a log file named atomrdf.log will be created in the current working directory.

graph#

The RDF graph.

Type:: rdflib.Graph

sgraph#

The structure graph for a single chosen sample

Type:: rdflib.Graph

ontology#

The ontology object.

Type:: Ontology

terms#

The dictionary of ontology terms.

Type:: dict

store#

The type of store used.

Type:: str

add(triple, validate=True)[source]#: Add a triple to the knowledge graph.

triples(triple)[source]#: Return the triples in the knowledge graph that match the given triple pattern.

query(source, destinations=None, return_df=True, num_paths=1, limit=None)[source]#: Execute a SPARQL query on the knowledge graph using tools4RDF.

get_sample_as_structure(sample_id)[source]#: Retrieve a sample from the graph as an AtomicScaleSample object.

add(triple)[source]#

Add a triple to the knowledge graph.

Parameters:: triple (tuple) – The triple to be added in the form (subject, predicate, object).

archive(package_name, format='turtle', compress=True, add_simulations=False)[source]#

Publish a dataset from graph including per atom quantities.

Parameters:#

package_namestr: The name of the package to be created.
formatstr, optional: The format in which the dataset should be written. Default is “turtle”.
compressbool, optional: Whether to compress the package into a tarball. Default is True.

Raises:#

ValueError: If the package_name already exists or if the tarball already exists.

Notes:#

This method creates a package containing a dataset from the graph, including per atom quantities. The package consists of a folder named package_name, which contains the dataset and related files. If compress is True, the package is compressed into a tarball.

The method performs the following steps: 1. Checks if the package_name already exists. If it does, raises a ValueError. 2. If compress is True, checks if the tarball already exists. If it does, raises a ValueError. 3. Creates a folder named package_name. 4. Creates a subfolder named rdf_structure_store within the package folder. 5. Copies the files associated with each sample to the rdf_structure_store folder, while fixing the paths. 6. Updates the paths in the graph to point to the copied files. 7. Writes the dataset to a file named “triples” within the package folder. 8. If compress is True, compresses the package folder into a tarball. 9. Removes the package folder.

close(filename, format='json-ld')[source]#

Close the graph and write to a file

Parameters:: filename (string) – name of output file
Return type:: None

close_store()[source]#

Release the underlying store (close file handles and locks).

This is a no-op for the in-memory store. For file-backed stores (Oxigraph, SQLAlchemy) it releases the file lock so the same store directory can be reopened in the same process or by another process.

Return type:: None

create_node(namestring, classtype, label=None)[source]#

Create a new node in the graph.

Parameters:

namestring (str) – The name of the node.
classtype (Object from a given ontology) – The class type of the node.

Returns:

The newly created node.

Return type:

URIRef

get_sample_as_structure(sample_id)[source]#

Retrieve a sample from the graph as an AtomicScaleSample object.

Parameters:: sample_id (str or URIRef) – The ID of the sample to retrieve
Returns:: The sample as an AtomicScaleSample pydantic object
Return type:: AtomicScaleSample

Examples

>>> kg = KnowledgeGraph()
>>> sample = kg.get_sample_as_structure('sample:123')
>>> atoms = sample.to_structure()  # Convert to ASE Atoms
>>> sample.to_file('output.lmp', format='lammps-dump')

invalidate_cache()[source]#: Invalidate cached derived data (e.g. to force a rebuild).

merge_archive(package_name, compress=True, format='turtle')[source]#

Merge an archived dataset into this KnowledgeGraph.

Unlike unarchive (which creates a new graph), this method loads the triples and structure-store files from an existing archive into the current graph so that multiple datasets can be combined incrementally:

kg = KnowledgeGraph()
kg.merge_archive("dataset_1_GB.tar.gz")
kg.merge_archive("dataset_2_GB.tar.gz")
# kg now contains both datasets

Parameters:

package_name (str) – Path to the archive. When compress is True (default) this should be a .tar.gz file; otherwise the name of an already- extracted directory.
compress (bool, optional) – Whether package_name is a compressed tarball. Default True.
format (str, optional) – RDF serialisation format of the triples file inside the archive. Default "turtle".

Notes

Structure-store JSON files from the archive are copied into self.structure_store. UUID-based filenames make collisions extremely unlikely; a warning is emitted if a file already exists and it is silently skipped (the existing copy wins).
After parsing, every CMSO.hasPath triple that still references the archive-internal rdf_structure_store/ prefix is rewritten to point at self.structure_store.

property n_samples#: Number of samples in the Graph

property properties#

Return a pandas DataFrame of all calculated/output properties in the graph.

Each row includes: uri, type, label, value, unit.

Covers any ASMO-typed property (e.g. TotalEnergy, FormationEnergy, CalculatedProperty, OutputParameter) but excludes InputParameter nodes.

The result is cached and automatically recomputed when new triples are added to the graph.

Return type:: pandas.DataFrame

purge(force=False)[source]#

Remove all information from the KnowledgeGraph.

Parameters:: force (bool, optional) – Whether to proceed with purging the graph. Default is False.
Return type:: None

Notes

This method removes all information from the KnowledgeGraph. If the force parameter is set to False, a warning is issued before proceeding with the purging.

query(source, destinations=None, return_df=True, num_paths=1, limit=None)[source]#

Execute a SPARQL query on the knowledge graph.

This method supports two query modes: 1. Raw SPARQL query strings (passed as source parameter) 2. Ontology-based queries using tools4RDF (source as OntoTerm)

Parameters:

source (str or OntoTerm) – If str: Raw SPARQL query string to execute directly. If OntoTerm: The source ontology term from which paths are to be queried. Access terms via self.ontology.terms (e.g., self.ontology.terms.cmso.AtomicScaleSample).
destinations (list of OntoTerm or OntoTerm, optional) – One or more destination ontology terms to which paths are to be queried. Can be a single term or a list of terms. If None, all properties of the source are returned. Only used when source is an OntoTerm.
return_df (bool, default=True) – If True, returns results as a pandas DataFrame. Otherwise, returns raw query results.
num_paths (int, default=1) – The number of paths to retrieve for each query when multiple paths exist. Only used when source is an OntoTerm.
limit (int, optional) – The maximum number of results to return. If None, no limit is applied. Only used when source is an OntoTerm.

Returns:

If return_df is True, returns a pandas DataFrame with query results. If return_df is False, returns a list of query results. Returns None if no results are found.

Return type:

pandas.DataFrame or list or None

Examples

Query with raw SPARQL string:

>>> query = '''
... PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
... SELECT DISTINCT ?symbol
... WHERE {
...     ?sample cmso:hasNumberOfAtoms ?number .
...     ?sample cmso:hasMaterial ?material .
...     ?material cmso:hasStructure ?structure .
...     ?structure cmso:hasSpaceGroupSymbol ?symbol .
... FILTER (?number="4"^^xsd:integer)
... }'''
>>> df = kg.query(query)

Query for all AtomicScaleSamples with their space group symbols:

>>> kg = KnowledgeGraph()
>>> df = kg.query(
...     kg.ontology.terms.cmso.AtomicScaleSample,
...     [kg.ontology.terms.cmso.hasSpaceGroupSymbol]
... )

Query with filters (using == operator on terms):

>>> df = kg.query(
...     kg.ontology.terms.cmso.AtomicScaleSample,
...     [kg.ontology.terms.cmso.hasNumberOfAtoms == 4]
... )

Notes

When using ontology terms, this method uses tools4RDF to automatically generate SPARQL queries based on the ontology structure. It handles namespace management, path finding between ontology terms, and result formatting automatically.

reconstruct_workflow(workflow_id, output_dir, mode='recreate', structure_format=None)[source]#

Reconstruct a workflow as an executable Python script.

Delegates to atomrdf.io.reconstruct.reconstruct_workflow().

Parameters:

workflow_id (str or URIRef) – URI of the workflow / simulation node.
output_dir (str) – Directory to write the generated script (created if needed).
mode (str) – "recreate" — fully runnable script. "create_template" — skeleton with TODO placeholders.
structure_format (str, optional) – Override structure file format (default "lammps-data").

Returns:

The output_dir path.

Return type:

str

reconstruct_workflow_by_sample(sample_id, output_dir, mode='recreate', structure_format=None)[source]#

Find the workflow that produced sample_id and reconstruct it.

Delegates to atomrdf.io.reconstruct.reconstruct_workflow_by_sample().

Parameters:

sample_id (str or URIRef)
output_dir (str)
mode (str)
structure_format (str, optional)

Returns:

The output_dir path.

Return type:

str

remove(triple)[source]#

Remove a triple from the knowledge graph.

Parameters:: triple (tuple) – The triple to be removed in the form (subject, predicate, object).
Return type:: None

Notes

This method removes a triple from the knowledge graph. The triple should be provided as a tuple in the form (subject, predicate, object).

Examples

>>> graph = KnowledgeGraph()
>>> graph.add(("Alice", "likes", "Bob"))
>>> graph.remove(("Alice", "likes", "Bob"))

property sample_ids#: Returns a list of all Samples in the graph

property sample_names#: Returns a list of all Sample names in the graph.

search_property(property_type, label=None)[source]#

Return properties matching the given type and optional label.

Parameters:

property_type (str) – ASMO type name to search for, e.g. "TotalEnergy", "FormationEnergy", "CalculatedProperty".
label (str, optional) – If provided, further filter by rdfs:label (case-insensitive).

Returns:

Each tuple is (iri, label, chemical_composition) where chemical_composition is a dict mapping element symbol → ratio (e.g. {"Fe": 1.0}), or an empty dict if no composition is found.

Return type:

list of tuple

to_file(sample, filename, format='lammps-data', copy_from=None, pseudo_files=None)[source]#

Write a sample structure to a file.

Parameters:

sample (str or URIRef) – Sample ID
filename (str) – Name of the output file
format (str, optional) – Format of the output file. Default is ‘lammps-data’. Any format supported by ASE can be used.
copy_from (str, optional) – If provided, input options for quantum-espresso format will be copied from the given file. Structure specific information will be replaced. Note that the validity of input file is not checked.
pseudo_files (list, optional) – If provided, add the pseudopotential filenames to file. Should be in alphabetical order of chemical species symbols.

Return type:

None

Examples

>>> kg = KnowledgeGraph()
>>> kg.to_file('sample:123', 'output.lmp', 'lammps-data')
>>> kg.to_file('sample:456', 'POSCAR', 'vasp')

to_gexf(output_file, include_literals=False, positions=None, sizes=None, top_n_labels=None, label_overrides=None)[source]#

Export the knowledge graph to GEXF format for visualisation in Gephi.

Nodes are coloured by semantic category:

Sample (orange) — cmso:AtomicScaleSample instances
Material (purple) — material description nodes
Structure (blue) — crystal-structure / unit-cell nodes
Element (green) — chemical element / species nodes
Calculation (red) — simulation / activity nodes
Potential (gold) — interatomic potential nodes
Property (teal) — calculated-property nodes
Literal (l.grey) — RDF literal values (if included)
Other (grey) — ontology terms & everything else

Gephi reads the viz:color attribute natively. The category node attribute can additionally be used in Gephi’s Partition panel.

Parameters:

output_file (str) – Destination path for the .gexf file.
include_literals (bool, optional) – Whether to add a node for every RDF literal value. Default is False, which drops literal nodes and their edges, producing a cleaner resource-only graph that is easier to explore in Gephi.

Returns:

output_file – The path of the file that was written.

Return type:

str

trace(sample_or_property)[source]#

Trace the provenance of a sample or calculated property.

Parameters:: sample_or_property (str or URIRef) – A sample URI (e.g. "sample:abc") or a calculated-property URI. If the URI matches a sample the trace walks backwards from that sample; if it matches a property the owning sample is found first.
Returns:: An iterable of pipeline step dicts with reconstructed ASE structures, method metadata, parameters, etc.
Return type:: Provenance

triples(triple)[source]#

Return the triples in the knowledge graph that match the given triple pattern.

Parameters:: triple (tuple) – The triple pattern to match in the form (subject, predicate, object).
Returns:: A generator that yields the matching triples.
Return type:: generator

classmethod unarchive(package_name, compress=True, store='Memory', store_file=None, identifier='http://default_graph', ontology=None)[source]#

Unarchives a package and returns an instance of the Graph class.

Parameters:

package_name (str) – The name of the package to unarchive.
compress (bool, optional) – Whether to compress the package. Defaults to True.
store (str, optional) – The type of store to use. Defaults to “Memory”.
store_file (str, optional) – The file to use for the store. Defaults to None.
identifier (str, optional) – The identifier for the graph. Defaults to “http://default_graph”.
ontology (str, optional) – The ontology to use. Defaults to None.

Returns:

An instance of the Graph class.

Return type:

Graph

Raises:

FileNotFoundError – If the package file is not found.
tarfile.TarError – If there is an error while extracting the package.

value(arg1, arg2)[source]#

Get the value of a triple in the knowledge graph.

Parameters:

arg1 (object) – The subject of the triple.
arg2 (object) – The predicate of the triple.

Returns:

The value of the triple if it exists, otherwise None.

Return type:

object or None

Notes

This method retrieves the value of a triple in the knowledge graph. The triple is specified by providing the subject and predicate as arguments. If the triple exists in the graph, the corresponding value is returned. If the triple does not exist, None is returned.

Examples

>>> graph = KnowledgeGraph()
>>> graph.add(("Alice", "likes", "Bob"))
>>> value = graph.value("Alice", "likes")
>>> print(value)
Bob

visualise(styledict=None, rankdir='BT', hide_types=False, workflow_view=False, sample_view=False, size=None, layout='neato')[source]#

Visualize the RDF tree of the Graph.

Parameters:

styledict (dict, optional) – If provided, allows customization of color and other properties.
rankdir (str, optional) – The direction of the graph layout. Default is “BT” (bottom to top).
hide_types (bool, optional) – Whether to hide the types in the visualization. Default is False.
workflow_view (bool, optional) – Whether to enable the workflow view. Default is False.
sample_view (bool, optional) – Whether to enable the sample view. Default is False.
size (tuple, optional) – The size of the visualization. Default is None.
layout (str, optional) – The name of the layout algorithm for the graph. Default is “neato”.

Returns:

The visualization of the RDF tree.

Return type:

graphviz.dot.Digraph

Notes

The styledict parameter allows customization of the visualization style. It has the following options:

BNode:

colorstr: The color of the BNode boxes.
shapestr: The shape of the BNode boxes.
stylestr: The style of the BNode boxes.

URIRef:

colorstr: The color of the URIRef boxes.
shapestr: The shape of the URIRef boxes.
stylestr: The style of the URIRef boxes.

Literal:

colorstr: The color of the Literal boxes.
shapestr: The shape of the Literal boxes.
stylestr: The style of the Literal boxes.

visualize(*args, **kwargs)[source]#

Visualizes the graph using the specified arguments.

This method is a wrapper around the visualise method and passes the same arguments to it.

Parameters:

*args (Variable length argument list.)
**kwargs (Arbitrary keyword arguments.)

Returns:

dot

Return type:

The visualization of the RDF tree.

write(filename, format='json-ld')[source]#

Write the serialised version of the graph to a file

Parameters:

filename (string) – name of output file
format (string, {'turtle', 'xml', 'json-ld', 'ntriples', 'n3'}) – output format to be written to

Return type:

None

class atomrdf.WorkflowParser(kg: KnowledgeGraph | None = None, precision: int = 6, debug: bool = False, hash_threshold: int | None = 10000)[source]#

Parser for workflow YAML/JSON files into RDF knowledge graph.

Handles parsing of: - Computational samples (with deduplication via hashing) - Workflows/Simulations - Operations (transformations between samples: DeleteAtom, SubstituteAtom,

AddAtom, Rotate, Translate, Shear)

Math operations (ASMO arithmetic: Subtraction, Addition, Multiplication, Division, Exponentiation)

kg#

The knowledge graph to populate

Type:: KnowledgeGraph

precision#

Decimal precision for hash computation

Type:: int

sample_map#

Maps original sample IDs to resolved URIs

Type:: dict

property_map#

Maps user-defined property IDs (from YAML ‘id’ fields on calculated_property / input_parameter / output_parameter entries) to their generated KG URI strings. Built incrementally as workflows and math operations are parsed, so later math_operation entries can reference earlier properties by their local ID.

Type:: dict

debug#

If True, print debug messages during parsing

Type:: bool

hash_threshold#

Skip hashing for samples with more than this many atoms. Set to None to disable hashing completely.

Type:: int or None

from_file(filepath: str | Path) → Dict[str, Any][source]#

Parse workflow data from a YAML or JSON file.

This is a convenience method that calls parse() with a file path.

Parameters:: filepath (str or Path) – Path to YAML or JSON file
Returns:: Parse results dictionary
Return type:: dict
Raises:: ValueError – If file format is not supported (must be .yaml, .yml, or .json)

parse(data: str | Path | Dict[str, Any]) → Dict[str, Any][source]#

Parse complete workflow data structure.

Parameters:

data (str, Path, or dict) – Either a file path (str/Path) to a YAML/JSON file, or a dictionary containing computational_sample, workflow, and/or activity keys

Returns:

Dictionary with the following keys:

’sample_map’ : dict mapping original IDs to URIs
’workflow_uris’ : list of created workflow URIs
’operation_uris’ : list of created operation URIs

Return type:

dict

Raises:

ValueError – If file format is not supported (must be .yaml, .yml, or .json)
TypeError – If data type is not supported

parse_math_operations(math_op_data_list: List[Dict[str, Any]]) → List[str][source]#

Parse math-operation entries (ASMO arithmetic activities).

Each entry must have a type key (one of Subtraction, Addition, Multiplication, Division, Exponentiation). Operands may be local property-ID strings (resolved via self.property_map) or numeric scalars. If the result carries an id field it is registered in property_map so subsequent math_operation entries can use it as an operand.

Parameters:: math_op_data_list (list of dict)
Returns:: List of math-operation activity-ID strings created.
Return type:: list of str
Raises:: ValueError – If the type field is missing or unrecognised.

parse_operations(operation_data_list: List[Dict[str, Any]]) → List[str][source]#

Parse operation data (transformations between samples).

Operations include: DeleteAtom, SubstituteAtom, AddAtom, Rotate, Translate, and Shear.

Parameters:: operation_data_list (list of dict) – List of operation dictionaries. Each must have: - ‘method’: The operation type (e.g., ‘DeleteAtom’, ‘Rotate’) - ‘input_sample’: Sample ID or list of sample IDs - ‘output_sample’: Sample ID or list of sample IDs - Additional method-specific parameters (e.g., rotation_matrix for Rotate)
Returns:: List of operation URIs created
Return type:: list of str
Raises:: ValueError – If operation method is not recognized

parse_samples(sample_data_list: List[Dict[str, Any]]) → Dict[str, str][source]#

Parse computational sample data and add to knowledge graph.

Performs deduplication via hash-based lookup. If a sample with the same hash already exists, reuses the existing URI. If atom_attribute contains a file_path key, the structure file is read via ASE and atoms are resolved before building the sample object.

Parameters:: sample_data_list (list of dict) – List of sample dictionaries
Returns:: Dictionary mapping original sample IDs to resolved URIs
Return type:: dict

parse_workflows(workflow_data_list: List[Dict[str, Any]]) → List[str][source]#

Parse workflow/simulation data and add to knowledge graph.

Resolves sample references using the sample_map.

Parameters:: workflow_data_list (list of dict) – List of workflow dictionaries
Returns:: List of workflow URIs created
Return type:: list of str

KnowledgeGraph#

Graph module contains the basic RDFGraph object in atomrdf. This object gets a structure as an input and annotates it with the CMSO ontology (PLDO and PODO too as needed). The annotated object is stored in triplets.

Notes

Always add type triples before adding further properties.

Classes#

KnowledgeGraph: Represents a knowledge graph that stores and annotates structure objects.

- defstyledict

Type:: A dictionary containing default styles for visualizing the graph.

class atomrdf.graph.KnowledgeGraph(graph_file=None, store='Memory', store_file=None, identifier='http://default_graph', ontology=None, structure_store=None, enable_log=False)[source]#

Represents a knowledge graph.

Parameters:

graph_file (str, optional) – The path to the graph file to be parsed. Default is None.
store (str, optional) – The type of store to use. Default is “Memory”.
store_file (str, optional) – The path to the store file. Default is None.
identifier (str, optional) – The identifier for the graph. Default is “http://default_graph”.
ontology (Ontology, optional) – The ontology object to be used. Default is None.
structure_store (StructureStore, optional) – The structure store object to be used. Default is None.
enable_log (bool, optional) – Whether to enable logging. Default is False. If true, a log file named atomrdf.log will be created in the current working directory.

graph#

The RDF graph.

Type:: rdflib.Graph

sgraph#

The structure graph for a single chosen sample

Type:: rdflib.Graph

ontology#

The ontology object.

Type:: Ontology

terms#

The dictionary of ontology terms.

Type:: dict

store#

The type of store used.

Type:: str

add(triple, validate=True)[source]#: Add a triple to the knowledge graph.

triples(triple)[source]#: Return the triples in the knowledge graph that match the given triple pattern.

query(source, destinations=None, return_df=True, num_paths=1, limit=None)[source]#: Execute a SPARQL query on the knowledge graph using tools4RDF.

get_sample_as_structure(sample_id)[source]#: Retrieve a sample from the graph as an AtomicScaleSample object.

add(triple)[source]#

Add a triple to the knowledge graph.

Parameters:: triple (tuple) – The triple to be added in the form (subject, predicate, object).

archive(package_name, format='turtle', compress=True, add_simulations=False)[source]#

Publish a dataset from graph including per atom quantities.

Parameters:#

package_namestr: The name of the package to be created.
formatstr, optional: The format in which the dataset should be written. Default is “turtle”.
compressbool, optional: Whether to compress the package into a tarball. Default is True.

Raises:#

ValueError: If the package_name already exists or if the tarball already exists.

Notes:#

This method creates a package containing a dataset from the graph, including per atom quantities. The package consists of a folder named package_name, which contains the dataset and related files. If compress is True, the package is compressed into a tarball.

The method performs the following steps: 1. Checks if the package_name already exists. If it does, raises a ValueError. 2. If compress is True, checks if the tarball already exists. If it does, raises a ValueError. 3. Creates a folder named package_name. 4. Creates a subfolder named rdf_structure_store within the package folder. 5. Copies the files associated with each sample to the rdf_structure_store folder, while fixing the paths. 6. Updates the paths in the graph to point to the copied files. 7. Writes the dataset to a file named “triples” within the package folder. 8. If compress is True, compresses the package folder into a tarball. 9. Removes the package folder.

close(filename, format='json-ld')[source]#

Close the graph and write to a file

Parameters:: filename (string) – name of output file
Return type:: None

close_store()[source]#

Release the underlying store (close file handles and locks).

This is a no-op for the in-memory store. For file-backed stores (Oxigraph, SQLAlchemy) it releases the file lock so the same store directory can be reopened in the same process or by another process.

Return type:: None

create_node(namestring, classtype, label=None)[source]#

Create a new node in the graph.

Parameters:

namestring (str) – The name of the node.
classtype (Object from a given ontology) – The class type of the node.

Returns:

The newly created node.

Return type:

URIRef

get_sample_as_structure(sample_id)[source]#

Retrieve a sample from the graph as an AtomicScaleSample object.

Parameters:: sample_id (str or URIRef) – The ID of the sample to retrieve
Returns:: The sample as an AtomicScaleSample pydantic object
Return type:: AtomicScaleSample

Examples

>>> kg = KnowledgeGraph()
>>> sample = kg.get_sample_as_structure('sample:123')
>>> atoms = sample.to_structure()  # Convert to ASE Atoms
>>> sample.to_file('output.lmp', format='lammps-dump')

invalidate_cache()[source]#: Invalidate cached derived data (e.g. to force a rebuild).

merge_archive(package_name, compress=True, format='turtle')[source]#

Merge an archived dataset into this KnowledgeGraph.

Unlike unarchive (which creates a new graph), this method loads the triples and structure-store files from an existing archive into the current graph so that multiple datasets can be combined incrementally:

kg = KnowledgeGraph()
kg.merge_archive("dataset_1_GB.tar.gz")
kg.merge_archive("dataset_2_GB.tar.gz")
# kg now contains both datasets

Parameters:

package_name (str) – Path to the archive. When compress is True (default) this should be a .tar.gz file; otherwise the name of an already- extracted directory.
compress (bool, optional) – Whether package_name is a compressed tarball. Default True.
format (str, optional) – RDF serialisation format of the triples file inside the archive. Default "turtle".

Notes

Structure-store JSON files from the archive are copied into self.structure_store. UUID-based filenames make collisions extremely unlikely; a warning is emitted if a file already exists and it is silently skipped (the existing copy wins).
After parsing, every CMSO.hasPath triple that still references the archive-internal rdf_structure_store/ prefix is rewritten to point at self.structure_store.

property n_samples#: Number of samples in the Graph

property properties#

Return a pandas DataFrame of all calculated/output properties in the graph.

Each row includes: uri, type, label, value, unit.

Covers any ASMO-typed property (e.g. TotalEnergy, FormationEnergy, CalculatedProperty, OutputParameter) but excludes InputParameter nodes.

The result is cached and automatically recomputed when new triples are added to the graph.

Return type:: pandas.DataFrame

purge(force=False)[source]#

Remove all information from the KnowledgeGraph.

Parameters:: force (bool, optional) – Whether to proceed with purging the graph. Default is False.
Return type:: None

Notes

This method removes all information from the KnowledgeGraph. If the force parameter is set to False, a warning is issued before proceeding with the purging.

query(source, destinations=None, return_df=True, num_paths=1, limit=None)[source]#

Execute a SPARQL query on the knowledge graph.

This method supports two query modes: 1. Raw SPARQL query strings (passed as source parameter) 2. Ontology-based queries using tools4RDF (source as OntoTerm)

Parameters:

source (str or OntoTerm) – If str: Raw SPARQL query string to execute directly. If OntoTerm: The source ontology term from which paths are to be queried. Access terms via self.ontology.terms (e.g., self.ontology.terms.cmso.AtomicScaleSample).
destinations (list of OntoTerm or OntoTerm, optional) – One or more destination ontology terms to which paths are to be queried. Can be a single term or a list of terms. If None, all properties of the source are returned. Only used when source is an OntoTerm.
return_df (bool, default=True) – If True, returns results as a pandas DataFrame. Otherwise, returns raw query results.
num_paths (int, default=1) – The number of paths to retrieve for each query when multiple paths exist. Only used when source is an OntoTerm.
limit (int, optional) – The maximum number of results to return. If None, no limit is applied. Only used when source is an OntoTerm.

Returns:

If return_df is True, returns a pandas DataFrame with query results. If return_df is False, returns a list of query results. Returns None if no results are found.

Return type:

pandas.DataFrame or list or None

Examples

Query with raw SPARQL string:

>>> query = '''
... PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
... SELECT DISTINCT ?symbol
... WHERE {
...     ?sample cmso:hasNumberOfAtoms ?number .
...     ?sample cmso:hasMaterial ?material .
...     ?material cmso:hasStructure ?structure .
...     ?structure cmso:hasSpaceGroupSymbol ?symbol .
... FILTER (?number="4"^^xsd:integer)
... }'''
>>> df = kg.query(query)

Query for all AtomicScaleSamples with their space group symbols:

>>> kg = KnowledgeGraph()
>>> df = kg.query(
...     kg.ontology.terms.cmso.AtomicScaleSample,
...     [kg.ontology.terms.cmso.hasSpaceGroupSymbol]
... )

Query with filters (using == operator on terms):

>>> df = kg.query(
...     kg.ontology.terms.cmso.AtomicScaleSample,
...     [kg.ontology.terms.cmso.hasNumberOfAtoms == 4]
... )

Notes

When using ontology terms, this method uses tools4RDF to automatically generate SPARQL queries based on the ontology structure. It handles namespace management, path finding between ontology terms, and result formatting automatically.

reconstruct_workflow(workflow_id, output_dir, mode='recreate', structure_format=None)[source]#

Reconstruct a workflow as an executable Python script.

Delegates to atomrdf.io.reconstruct.reconstruct_workflow().

Parameters:

workflow_id (str or URIRef) – URI of the workflow / simulation node.
output_dir (str) – Directory to write the generated script (created if needed).
mode (str) – "recreate" — fully runnable script. "create_template" — skeleton with TODO placeholders.
structure_format (str, optional) – Override structure file format (default "lammps-data").

Returns:

The output_dir path.

Return type:

str

reconstruct_workflow_by_sample(sample_id, output_dir, mode='recreate', structure_format=None)[source]#

Find the workflow that produced sample_id and reconstruct it.

Delegates to atomrdf.io.reconstruct.reconstruct_workflow_by_sample().

Parameters:

sample_id (str or URIRef)
output_dir (str)
mode (str)
structure_format (str, optional)

Returns:

The output_dir path.

Return type:

str

remove(triple)[source]#

Remove a triple from the knowledge graph.

Parameters:: triple (tuple) – The triple to be removed in the form (subject, predicate, object).
Return type:: None

Notes

This method removes a triple from the knowledge graph. The triple should be provided as a tuple in the form (subject, predicate, object).

Examples

>>> graph = KnowledgeGraph()
>>> graph.add(("Alice", "likes", "Bob"))
>>> graph.remove(("Alice", "likes", "Bob"))

property sample_ids#: Returns a list of all Samples in the graph

property sample_names#: Returns a list of all Sample names in the graph.

search_property(property_type, label=None)[source]#

Return properties matching the given type and optional label.

Parameters:

property_type (str) – ASMO type name to search for, e.g. "TotalEnergy", "FormationEnergy", "CalculatedProperty".
label (str, optional) – If provided, further filter by rdfs:label (case-insensitive).

Returns:

Each tuple is (iri, label, chemical_composition) where chemical_composition is a dict mapping element symbol → ratio (e.g. {"Fe": 1.0}), or an empty dict if no composition is found.

Return type:

list of tuple

to_file(sample, filename, format='lammps-data', copy_from=None, pseudo_files=None)[source]#

Write a sample structure to a file.

Parameters:

sample (str or URIRef) – Sample ID
filename (str) – Name of the output file
format (str, optional) – Format of the output file. Default is ‘lammps-data’. Any format supported by ASE can be used.
copy_from (str, optional) – If provided, input options for quantum-espresso format will be copied from the given file. Structure specific information will be replaced. Note that the validity of input file is not checked.
pseudo_files (list, optional) – If provided, add the pseudopotential filenames to file. Should be in alphabetical order of chemical species symbols.

Return type:

None

Examples

>>> kg = KnowledgeGraph()
>>> kg.to_file('sample:123', 'output.lmp', 'lammps-data')
>>> kg.to_file('sample:456', 'POSCAR', 'vasp')

to_gexf(output_file, include_literals=False, positions=None, sizes=None, top_n_labels=None, label_overrides=None)[source]#

Export the knowledge graph to GEXF format for visualisation in Gephi.

Nodes are coloured by semantic category:

Sample (orange) — cmso:AtomicScaleSample instances
Material (purple) — material description nodes
Structure (blue) — crystal-structure / unit-cell nodes
Element (green) — chemical element / species nodes
Calculation (red) — simulation / activity nodes
Potential (gold) — interatomic potential nodes
Property (teal) — calculated-property nodes
Literal (l.grey) — RDF literal values (if included)
Other (grey) — ontology terms & everything else

Gephi reads the viz:color attribute natively. The category node attribute can additionally be used in Gephi’s Partition panel.

Parameters:

output_file (str) – Destination path for the .gexf file.
include_literals (bool, optional) – Whether to add a node for every RDF literal value. Default is False, which drops literal nodes and their edges, producing a cleaner resource-only graph that is easier to explore in Gephi.

Returns:

output_file – The path of the file that was written.

Return type:

str

trace(sample_or_property)[source]#

Trace the provenance of a sample or calculated property.

Parameters:: sample_or_property (str or URIRef) – A sample URI (e.g. "sample:abc") or a calculated-property URI. If the URI matches a sample the trace walks backwards from that sample; if it matches a property the owning sample is found first.
Returns:: An iterable of pipeline step dicts with reconstructed ASE structures, method metadata, parameters, etc.
Return type:: Provenance

triples(triple)[source]#

Return the triples in the knowledge graph that match the given triple pattern.

Parameters:: triple (tuple) – The triple pattern to match in the form (subject, predicate, object).
Returns:: A generator that yields the matching triples.
Return type:: generator

classmethod unarchive(package_name, compress=True, store='Memory', store_file=None, identifier='http://default_graph', ontology=None)[source]#

Unarchives a package and returns an instance of the Graph class.

Parameters:

package_name (str) – The name of the package to unarchive.
compress (bool, optional) – Whether to compress the package. Defaults to True.
store (str, optional) – The type of store to use. Defaults to “Memory”.
store_file (str, optional) – The file to use for the store. Defaults to None.
identifier (str, optional) – The identifier for the graph. Defaults to “http://default_graph”.
ontology (str, optional) – The ontology to use. Defaults to None.

Returns:

An instance of the Graph class.

Return type:

Graph

Raises:

FileNotFoundError – If the package file is not found.
tarfile.TarError – If there is an error while extracting the package.

value(arg1, arg2)[source]#

Get the value of a triple in the knowledge graph.

Parameters:

arg1 (object) – The subject of the triple.
arg2 (object) – The predicate of the triple.

Returns:

The value of the triple if it exists, otherwise None.

Return type:

object or None

Notes

This method retrieves the value of a triple in the knowledge graph. The triple is specified by providing the subject and predicate as arguments. If the triple exists in the graph, the corresponding value is returned. If the triple does not exist, None is returned.

Examples

>>> graph = KnowledgeGraph()
>>> graph.add(("Alice", "likes", "Bob"))
>>> value = graph.value("Alice", "likes")
>>> print(value)
Bob

visualise(styledict=None, rankdir='BT', hide_types=False, workflow_view=False, sample_view=False, size=None, layout='neato')[source]#

Visualize the RDF tree of the Graph.

Parameters:

styledict (dict, optional) – If provided, allows customization of color and other properties.
rankdir (str, optional) – The direction of the graph layout. Default is “BT” (bottom to top).
hide_types (bool, optional) – Whether to hide the types in the visualization. Default is False.
workflow_view (bool, optional) – Whether to enable the workflow view. Default is False.
sample_view (bool, optional) – Whether to enable the sample view. Default is False.
size (tuple, optional) – The size of the visualization. Default is None.
layout (str, optional) – The name of the layout algorithm for the graph. Default is “neato”.

Returns:

The visualization of the RDF tree.

Return type:

graphviz.dot.Digraph

Notes

The styledict parameter allows customization of the visualization style. It has the following options:

BNode:

colorstr: The color of the BNode boxes.
shapestr: The shape of the BNode boxes.
stylestr: The style of the BNode boxes.

URIRef:

colorstr: The color of the URIRef boxes.
shapestr: The shape of the URIRef boxes.
stylestr: The style of the URIRef boxes.

Literal:

colorstr: The color of the Literal boxes.
shapestr: The shape of the Literal boxes.
stylestr: The style of the Literal boxes.

visualize(*args, **kwargs)[source]#

Visualizes the graph using the specified arguments.

This method is a wrapper around the visualise method and passes the same arguments to it.

Parameters:

*args (Variable length argument list.)
**kwargs (Arbitrary keyword arguments.)

Returns:

dot

Return type:

The visualization of the RDF tree.

write(filename, format='json-ld')[source]#

Write the serialised version of the graph to a file

Parameters:

filename (string) – name of output file
format (string, {'turtle', 'xml', 'json-ld', 'ntriples', 'n3'}) – output format to be written to

Return type:

None

Namespace#

Stores#

atomrdf.stores.create_store(kg, store, identifier, store_file=None, structure_store=None)[source]#

Create a store based on the given parameters.

Parameters:#

kgKnowledgeGraph: The knowledge graph object.
storestr or Project: The type of store to create. It can be either “Memory”, “SQLAlchemy”, or a pyiron Project object.
identifierstr: The identifier for the store.
store_filestr, optional: The file path to store the data (only applicable for certain store types).
structure_storestr, optional: The structure store to use (only applicable for certain store types).

Raises:#

ValueError: If an unknown store type is provided.

atomrdf.stores.store_alchemy(kg, store, identifier, store_file=None, structure_store=None)[source]#

Store the knowledge graph using SQLAlchemy.

Parameters:

kg (KnowledgeGraph) – The knowledge graph to be stored.
store (str) – The type of store to be used.
identifier (str) – The identifier for the graph.
store_file (str, optional) – The file path for the store. Required if store is not ‘memory’.
structure_store (str, optional) – The structure store to be used.

Raises:

ValueError – If store_file is None and store is not ‘memory’.

Return type:

None

atomrdf.stores.store_memory(kg, store, identifier, store_file=None, structure_store=None)[source]#

Store the knowledge graph in memory.

Parameters:

kg (KnowledgeGraph) – The knowledge graph to be stored.
store (str) – The type of store to use for storing the graph.
identifier (str) – The identifier for the graph.
store_file (str, optional) – The file to store the graph in. Defaults to None.
structure_store (str, optional) – The structure store to use. Defaults to None.

Return type:

None

atomrdf.stores.store_oxigraph(kg, store, identifier, store_file=None, structure_store=None)[source]#

Store the knowledge graph using Oxigraph (via oxrdflib).

Parameters:

kg (KnowledgeGraph) – The knowledge graph to be stored.
store (str) – The type of store to be used.
identifier (str or URIRef) – The URI identifier for the named graph. Must be consistent across open/reopen calls to retrieve the same triples.
store_file (str, optional) – Directory path for the persistent on-disk Oxigraph store. If None, an in-memory store is used (data is lost when the object is garbage-collected).
structure_store (str, optional) – The structure store to be used.

Raises:

RuntimeError – If oxrdflib is not installed.

Return type:

None

Properties#

atomrdf.properties.get_basis_positions(system)[source]#

Get the basis positions from the given system.

Parameters:: system (object) – The system object containing the structure dictionary.
Returns:: The basis positions if available, otherwise None.
Return type:: numpy.ndarray or None

atomrdf.properties.get_bravais_lattice(structure)[source]#

Get the Bravais lattice of a given system.

Parameters:: system (object) – The system object for which the Bravais lattice is to be determined.
Returns:: The Bravais lattice of the system, or None if the system’s structure dictionary is not available or the lattice is not found in the dictionary.
Return type:: str or None

atomrdf.properties.get_cell_volume(system)[source]#

Get the volume of the simulation cell.

Parameters:: system (object) – The system object.
Returns:: volume – The volume of the simulation cell.
Return type:: float

atomrdf.properties.get_chemical_composition(structure)[source]#

Get the chemical composition of the system.

Parameters:: system (object) – The system object.
Returns:: composition – A dictionary containing the chemical elements as keys and their corresponding counts as values.
Return type:: dict

atomrdf.properties.get_crystal_structure_name(system)[source]#

Get the name of the crystal structure for a given system.

Parameters:: system (object) – The system object containing the crystal structure information.
Returns:: The name of the crystal structure if available, otherwise None.
Return type:: str or None

atomrdf.properties.get_lattice_angle(system)[source]#

Calculate the lattice angles of a given system.

Parameters:: system (object) – The system object containing the structure information.
Returns:: A list of three lattice angles in degrees. If the structure information is not available, [None, None, None] is returned.
Return type:: list

atomrdf.properties.get_lattice_parameter(system)[source]#

Calculate the lattice parameters of a system.

Parameters:: system (object) – The system object containing information about the atoms and structure.
Returns:: A list containing the lattice parameters of the system. If the lattice constant is not available, [None, None, None] is returned. If the system structure is available, the lattice parameters are calculated based on the box dimensions. Otherwise, the lattice constant is returned for all three dimensions.
Return type:: list

Examples

>>> system = System()
>>> system.atoms._lattice_constant = 3.5
>>> system._structure_dict = {"box": [[1, 0, 0], [0, 1, 0], [0, 0, 1]]}
>>> get_lattice_parameter(system)
[3.5, 3.5, 3.5]

>>> system.atoms._lattice_constant = None
>>> get_lattice_parameter(system)
[None, None, None]

atomrdf.properties.get_lattice_vector(system)[source]#

Get the lattice vector of a system.

Parameters:: system (object) – The system object containing the structure information.
Returns:: A list representing the lattice vector of the system. If the structure dictionary is not available or the lattice vector is not defined, it returns [None, None, None].
Return type:: list

atomrdf.properties.get_number_of_atoms(system)[source]#

Get the number of atoms in the system.

Parameters:: system (object) – The system object.
Returns:: natoms – The number of atoms in the system.
Return type:: int

atomrdf.properties.get_position(system)[source]#

Get the positions of the atoms in the system.

Parameters:: system (object) – The system object containing the atom positions.
Returns:: The positions of the atoms if available, otherwise None.
Return type:: numpy.ndarray or None

atomrdf.properties.get_simulation_cell_angle(system)[source]#

Get the angles between the vectors of the simulation cell.

Parameters:: system (object) – The system object containing the simulation cell information.
Returns:: angles – A list containing the angles between the vectors of the simulation cell.
Return type:: list

atomrdf.properties.get_simulation_cell_length(system)[source]#

Get the length of the simulation cell.

Parameters:: system (object) – The system object.
Returns:: length – A list containing the length of each dimension of the simulation cell.
Return type:: list

atomrdf.properties.get_simulation_cell_vector(system)[source]#

Get the simulation cell vector of the given system.

Parameters:: system (object) – The system object containing the simulation cell information.
Returns:: The simulation cell vector of the system.
Return type:: numpy.ndarray

atomrdf.properties.get_spacegroup_number(system)[source]#

Get the spacegroup number of a given system.

Parameters:: system (object) – The system object for which the spacegroup number is to be determined.
Returns:: The spacegroup number of the system if it is available, otherwise None.
Return type:: int or None

atomrdf.properties.get_spacegroup_symbol(system)[source]#

Get the symbol of the spacegroup for a given system.

Parameters:: system (object) – The system object for which to retrieve the spacegroup symbol.
Returns:: The symbol of the spacegroup if available, otherwise None.
Return type:: str

atomrdf.properties.get_species(system)[source]#

Get the species of atoms in the given system.

Parameters:: system (System) – The system object containing atoms.
Returns:: A list of species of atoms in the system.
Return type:: list

Visualisation#

atomrdf.visualize.GEXF_CATEGORY_COLORS = {'Calculation': (192, 57, 43), 'Element': (39, 174, 96), 'Literal': (189, 195, 199), 'Material': (155, 89, 182), 'Other': (149, 165, 166), 'Potential': (243, 156, 18), 'Property': (22, 160, 133), 'Sample': (224, 123, 57), 'Structure': (41, 128, 185)}#: RGB colours for each semantic category used in the GEXF / Gephi export.

atomrdf.visualize.get_string_from_URI(x)[source]#

Extract a presentable string from URI.

Parameters:: x (rdflib.term.URIRef) – The URI object to extract the string from.
Returns:: A tuple containing the presentable string representation of the URI and its type. The string representation is the last part of the URI after splitting by ‘#’ or ‘/’. The type can be either “URIRef” or “BNode”.
Return type:: tuple

atomrdf.visualize.parse_object(x)[source]#

Parse the given object and return its title and type.

Parameters:: x (RDF term) – The RDF term to parse.
Returns:: A tuple containing the title of the object and its type.
Return type:: tuple

atomrdf.visualize.to_gexf(g, output_file, include_literals=False, positions=None, sizes=None, top_n_labels=None, label_overrides=None, top_label_uris=None, injected_type_map=None)[source]#

Export an RDF graph to GEXF format for visualisation in Gephi.

Nodes are coloured by semantic category:

Category	Colour	Covers
Sample	orange `#E07B39`	`cmso:AtomicScaleSample` instances
Material	purple `#9B59B6`	Material description nodes
Structure	blue `#2980B9`	Crystal-structure / unit-cell nodes
Element	green `#27AE60`	Chemical element / species nodes
Calculation	red `#C0392B`	Simulation / activity nodes
Potential	gold `#F39C12`	Interatomic potential nodes
Property	teal `#16A085`	Calculated-property nodes
Literal	l.grey `#BDC3C7`	RDF literal values
Other	grey `#95A5A6`	Ontology terms & everything else

The viz:color attribute written into the GEXF file is read natively by Gephi and drives the default node colour. The category attribute is also stored as a node attribute so it can be used in Gephi’s Partition panel for colour/size adjustments after import.

Parameters:

g (rdflib.Graph) – The graph to serialise (plain, named, or conjunctive graph).
output_file (str) – Destination path for the .gexf file.
include_literals (bool, optional) – Whether to create a node for every literal value. Default is False (drops literal nodes and their edges), which produces a cleaner resource-only graph that is easier to explore in Gephi.
positions (dict, optional) – Mapping {uri_string: (x, y)} of pre-computed layout coordinates. Written as viz:position elements so Gephi uses them directly. When None no position attributes are written.
sizes (dict, optional) – Mapping {uri_string: float} of pre-computed node sizes. Written as viz:size elements. When None Gephi uses its default.
top_n_labels (int, optional) – When set, only the top_n_labels highest-degree nodes keep a visible label string; all other nodes are exported with an empty label. This is useful when opening in Gephi with “Show node labels” enabled — only the most connected nodes will display text. When None (the default) every node keeps its label.
label_overrides (dict, optional) – Mapping {uri_string: display_label} of explicit label replacements. Applied after _gexf_label() so any URI can be given a clean short name (e.g. {"http://www.vasp.at": "VASP"}).

Returns:

output_file – The path of the file that was written.

Return type:

str

atomrdf.visualize.visualize_graph(g, styledict={'BNode': {'color': '#ffe6ff', 'shape': 'box', 'style': 'filled'}, 'Literal': {'color': '#e6ffcc', 'shape': 'ellipse', 'style': 'filled'}, 'URIRef': {'color': '#ffffcc', 'shape': 'box', 'style': 'filled'}}, rankdir='TB', hide_types=False, workflow_view=False, sample_view=False, size=None, layout='dot')[source]#

Visualizes a graph using Graphviz.

Parameters:

g (dict) – The graph to visualize.
styledict (dict, optional) – A dictionary containing styles for different types of nodes and edges. Default is styledict.
rankdir (str, optional) – The direction of the graph layout. Default is “TB” (top to bottom).
hide_types (bool, optional) – Whether to hide nodes with the “type” attribute. Default is False.
workflow_view (bool, optional) – Whether to enable the workflow view. Default is False.
sample_view (bool, optional) – Whether to enable the sample view. Default is False.
size (str, optional) – The size of the graph. Default is None.
layout (str, optional) – The layout algorithm to use. Default is “dot”.

Returns:

dot – The graph visualization.

Return type:

graphviz.Digraph

Data models#

Structure#

Module defines the basic structure of atomic scale samples, including materials, crystal structures, unit cells, and simulation cells. It also includes the definition of atom attributes and various types of defects.

class atomrdf.datamodels.structure.AtomAttribute(*, id: str | None = None, label: str | None = None, pid: str | None = 'http://purls.helmholtz-metadaten.de/cmso/AtomAttribute', position: Annotated[List[List[float]] | None, SkipValidation()] = None, species: Annotated[List[str] | None, SkipValidation()] = None)[source]#

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class atomrdf.datamodels.structure.AtomicScaleSample(*, id: str | None = None, label: str | None = None, pid: str | None = 'http://purls.helmholtz-metadaten.de/cmso/AtomicScaleSample', material: Material | None = None, simulation_cell: SimulationCell | None = None, atom_attribute: AtomAttribute | None = None, point_defect: PointDefect | None = None, vacancy: Vacancy | None = None, substitutional: Substitutional | None = None, interstitial: Interstitial | None = None, dislocation: Dislocation | None = None, edge_dislocation: EdgeDislocation | None = None, screw_dislocation: ScrewDislocation | None = None, mixed_dislocation: MixedDislocation | None = None, stacking_fault: StackingFault | None = None, grain_boundary: GrainBoundary | None = None, tilt_grain_boundary: TiltGrainBoundary | None = None, twist_grain_boundary: TwistGrainBoundary | None = None, symmetric_tilt_grain_boundary: SymmetricalTiltGrainBoundary | None = None, mixed_grain_boundary: MixedGrainBoundary | None = None, calculated_property: List[CalculatedProperty] | None = [], defect_complex: DefectComplex | None = None)[source]#

classmethod from_file(filename, format='lammps-dump', species=None, lattice=None, lattice_constant=None, basis_box=None, basis_positions=None, repeat=None, graph=None)[source]#

Read structure from file and create an AtomicScaleSample instance.

Parameters:

filename (str) – Path to the structure file
format (str, optional) – File format (default: ‘lammps-dump’). Any format supported by ASE.
species (list, optional) – If provided, LAMMPS types will be matched to species. For example, if types 1 and 2 exist in the input file, and species = [‘Li’, ‘Al’] is given, type 1 will be matched to ‘Li’ and type 2 will be matched to ‘Al’
lattice (str, optional) – Crystal structure name (e.g., ‘bcc’, ‘fcc’, ‘hcp’, ‘diamond’, ‘l12’, ‘b2’). If provided, metadata such as unit cell, space group, etc. are automatically added.
lattice_constant (float, optional) – Lattice constant of the system
basis_box (list of lists, optional) – 3x3 matrix specifying the basis unit cell. Not required if lattice is provided.
basis_positions (list of lists, optional) – Nx3 array specifying relative positions of atoms in the unit cell. Not required if lattice is provided.
repeat (tuple or int, optional) – Number of repetitions of the unit cell in each direction.
graph (KnowledgeGraph, optional) – If provided, the structure will be added to the graph.

Returns:

The created sample instance

Return type:

AtomicScaleSample

Examples

>>> sample = AtomicScaleSample.from_file('structure.lmp', format='lammps-dump')
>>> sample = AtomicScaleSample.from_file('POSCAR', format='vasp',
...                                       lattice='bcc', lattice_constant=2.87)

classmethod from_repository(repository='materials_project', api_key=None, material_ids=None, chemical_system=None, is_stable=True, conventional=True, graph=None)[source]#

Fetch structure(s) from an external repository and create AtomicScaleSample instance(s).

Parameters:

repository (str, optional) – Repository name. Currently supports: ‘materials_project’ (default).
api_key (str) – API key for the repository.
material_ids (list of str, optional) – List of material IDs to fetch. For Materials Project, these are mp-ids like [‘mp-149’, ‘mp-13’].
chemical_system (str, optional) – Chemical system string (e.g., ‘Fe-C’, ‘Li-Co-O’). If provided, all stable materials in this system will be fetched.
is_stable (bool, optional) – If True (default), only fetch stable materials. Only used with chemical_system.
conventional (bool, optional) – If True (default), use conventional cell. If False, use primitive cell.
graph (KnowledgeGraph, optional) – If provided, the structure(s) will be added to the graph.

Returns:

If a single material is fetched, returns AtomicScaleSample. If multiple materials are fetched, returns a list of AtomicScaleSample instances.

Return type:

AtomicScaleSample or list of AtomicScaleSample

Raises:

ValueError – If neither material_ids nor chemical_system is provided.
ImportError – If the required repository client library is not installed.

Examples

Fetch a single material by ID: >>> sample = AtomicScaleSample.from_repository( … repository=’materials_project’, … api_key=’your_api_key’, … material_ids=[‘mp-149’] … )

Fetch all stable materials in a chemical system: >>> samples = AtomicScaleSample.from_repository( … repository=’materials_project’, … api_key=’your_api_key’, … chemical_system=’Fe-C’ … )

Fetch and add to graph: >>> kg = KnowledgeGraph() >>> sample = AtomicScaleSample.from_repository( … repository=’materials_project’, … api_key=’your_api_key’, … material_ids=[‘mp-149’], … graph=kg … )

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_file(outfile, format, copy_from=None, pseudo_files=None)[source]#

Write the structure to a file in the specified format.

Parameters:

outfile (str) – The path to the output file.
format (str, optional) – The format of the output file. Defaults to ‘lammps-dump’.
copy_from (str, optional) – If provided, input options for quantum-espresso format will be copied from the given file. Structure specific information will be replaced. Note that the validity of input file is not checked.
pseudo_files (list, optional) – if provided, add the pseudopotential filenames to file. Should be in alphabetical order of chemical species symbols.

Return type:

None

update_attributes(atoms, repeat=None)[source]#: Update the atom attributes based on the provided ASE Atoms object. This would also reset the id, since the structure has changed.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Activity#

class atomrdf.datamodels.activity.Activity(**data: Any)[source]#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Dataset#

class atomrdf.datamodels.dataset.Creator(*, id: str | None = None, label: str | None = None, name: str | None = None)[source]#

A person who created a dataset, mapped to foaf:Person.

id (from TemplateMixin): URI identifying the person, e.g. an ORCID.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class atomrdf.datamodels.dataset.Dataset(*, id: str | None = None, label: str | None = None, identifier: str | None = None, title: str | None = None, creators: ~typing.List[~atomrdf.datamodels.dataset.Creator] = <factory>, publication: ~atomrdf.datamodels.dataset.Publication | None = None, samples: ~typing.List[str] = <factory>)[source]#

A dataset, mapped to dcat:Dataset.

classmethod from_dict(data: dict) → Dataset[source]#: Construct a Dataset from a plain dictionary (e.g., parsed YAML).

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class atomrdf.datamodels.dataset.Publication(*, id: str | None = None, label: str | None = None, identifier: str | None = None, title: str | None = None)[source]#

A bibliographic resource (paper), mapped to dcterms:BibliographicResource.

id (from TemplateMixin): URI identifying the paper.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Physical quantities#

A class to represent a physical quantity with its value, unit, and associated metadata.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Build helpers#

atomrdf.build.defect.dislocation(element, slip_system, dislocation_line, elastic_constant_dict, burgers_vector=None, dislocation_type='monopole', crystalstructure=None, a=None, b=None, c=None, alpha=None, covera=None, repeat=1, graph=None, label=None, return_atomman_dislocation=False)[source]#

Notes

This function requires the atomman Python package to be installed.

The elastic_constant_dict parameter should be a dictionary of elastic constants with keys corresponding to the following Voigt notation: “C11”, “C12”, “C13”, “C14”, “C15”, “C16”, “C22”, “C23”, “C24”, “C25”, “C26”, “C33”, “C34”, “C35”, “C36”, “C44”, “C45”, “C46”, “C55”, “C56”, “C66”. The values should be given in GPa.

The dislocation_type parameter can be set to “monopole” or “periodicarray”. If set to “monopole”, a single dislocation will be generated. If set to “periodicarray”, a periodic array of dislocations will be generated.

Needs atomman.

atomrdf.build.defect.grain_boundary(element, axis, sigma, gb_plane, crystalstructure=None, a=None, b=None, c=None, alpha=None, covera=None, overlap=0.0, gap=0.0, vacuum=0.0, delete_layer='0b0t0b0t', tolerance=0.25, uc_a=1, uc_b=1, repeat=None, graph=None, primitive=False)[source]#

Create a grain boundary system. GB can be created either with AIMSGB or GBCode.

Parameters:#

axistuple or list: The rotation axis of the grain boundary. Used with backend ‘aimsgb’ and ‘gbcode’.
sigmaint: The sigma value of the grain boundary. Used with backend ‘aimsgb’ and ‘gbcode’.
gb_planetuple or list: The Miller indices of the grain boundary plane. Used with backend ‘aimsgb’ and ‘gbcode’.
backendstr, optional: The backend to use to create the grain boundary. Default is ‘aimsgb’. Some keyword arguments are only suitable for some backend.
structurethe lattice structure to be used to create the GB, optional: The lattice structure to populate the grain boundary with. Used with backend ‘aimsgb’ and ‘gbcode’.
elementstr, optional: The element symbol to populate the grain boundary with. Used with backend ‘aimsgb’ and ‘gbcode’.
lattice_constantfloat, optional: The lattice constant of the structure. Used with backend ‘aimsgb’ and ‘gbcode’.
repetitionstuple or list, optional: The number of repetitions of the structure that will be used to create the GB. Used only with ‘gbcode’. For example, if (2,3,4) is provided, each grain will have these repetitions in (x,y,z) directions. For similar functionality in ‘aimsgb’, use ‘uc_a’ and ‘uc_b’.
overlapfloat, optional: The overlap between adjacent grain boundaries. Used only with ‘gbcode’.
vaccumfloat, optional: Adds space between the grains at one of the two interfaces that must exist due to periodic boundary conditions. Used only with ‘aimsgb’.
gap: float, optional: Adds space between the grains at both of the two interfaces that must exist due to periodic boundary conditions. Used only with ‘aimsgb’.
delete_layer: str, optional: To delete layers of the GB. Used only with ‘aimsgb’.
tolerance: float, optional: Tolerance factor (in distance units) to determine whether two atoms are in the same plane. Used only with ‘aimsgb’.
primitive: bool, optional: To generate primitive or non-primitive GB structure. Used only with ‘aimsgb’.
uc_a: int, optional: Number of unit cells of left grain. Used only with ‘aimsgb’.
uc_b: int, optional: Number of unit cells of right grain. Used only with ‘aimsgb’.
graphatomrdf.KnowledgeGraph, optional: The graph object to store the system. The system is only added to the KnowledgeGraph if this option is provided.
namesbool, optional: If True human readable names will be assigned to each property. If False random ids will be used. Default is False.
label: str, optional: Add a label to the structure
add_extras: bool, optional: returns internal objects of the GB creation process.

Returns:#

atomrdf.System: The grain boundary system.

Notes

This function requires the aimsgb and pymatgen packages to be installed to use the ‘aimsgb’ backend.

repetitions is used only with the ‘gbcode’ backend. For similar functionality in ‘aimsgb’, use uc_a and uc_b. However, repetition in the third direction is not supported in ‘aimsgb’. For a similar effect, after reaching the GB, system.modify.repeat function could be used with (1, 1, u_c).

If ‘gbcode’ is used as backend, the specific type of GB is determined using the find_gb_character function When backend ‘aimsgb’ is used, this is attempted. If the type could not be found, a normal GB will be added in the annotation.

atomrdf.build.defect.interstitial(atoms, element, void_type='tetrahedral', number=1, a=None, threshold=0.01, graph=None)[source]#

Create interstitial defects by adding atoms at void positions.

Parameters:

atoms (ase.Atoms or str) – Either an ASE Atoms object or element symbol for creating bulk structure
element (str or list) – Element symbol(s) for interstitial atom(s)
void_type (str, optional) – Type of void position: ‘tetrahedral’ or ‘octahedral’. Default is ‘tetrahedral’
number (int, optional) – Number of interstitial atoms to add. Default is 1
a (float, optional) – Lattice constant. Required for octahedral voids
threshold (float, optional) – Threshold for finding octahedral positions. Default is 0.01
graph (KnowledgeGraph, optional) – KnowledgeGraph to add the structure to

Returns:

Structure with interstitial defects

Return type:

ase.Atoms

atomrdf.build.defect.stacking_fault(element, slip_plane, displacement_a, displacement_b=0, slip_direction_a=None, slip_direction_b=None, vacuum=0, minwidth=15, even=True, minimum_r=None, relative_fault_position=0.5, crystalstructure=None, a=None, b=None, c=None, alpha=None, covera=None, repeat=1, graph=None)[source]#

Generate a stacking fault structure.

Parameters:

slip_system (list of lists, shape (2 x 3) or (2 x 4)) –
the slip system for the given system. The input should of type [[u, v, w], [h, k, l]]. [u, v, w] is the slip direction and [h, k, l] is the slip plane.

For HCP systems, the input should be [[u, v, w, z], [h, k, l, m]].
distance (float) – Distance for translating one half of the cell along the [h k l] direction. Default is 1.

atomrdf.build.defect.substitutional(atoms, element, number=1, indices=None, graph=None)[source]#

Create substitutional defects by replacing atoms with different element(s).

Parameters:

atoms (ase.Atoms) – ASE Atoms object
element (str) – Element symbol for substitutional atom
number (int, optional) – Number of substitutions to make. Default is 1
indices (list or array, optional) – Specific atom indices to substitute. If None, random atoms are chosen
graph (KnowledgeGraph, optional) – KnowledgeGraph to add the structure to

Returns:

Structure with substitutional defects

Return type:

ase.Atoms

Workflow parser#

class atomrdf.io.workflow_parser.WorkflowParser(kg: KnowledgeGraph | None = None, precision: int = 6, debug: bool = False, hash_threshold: int | None = 10000)[source]#

Parser for workflow YAML/JSON files into RDF knowledge graph.

Handles parsing of: - Computational samples (with deduplication via hashing) - Workflows/Simulations - Operations (transformations between samples: DeleteAtom, SubstituteAtom,

AddAtom, Rotate, Translate, Shear)

Math operations (ASMO arithmetic: Subtraction, Addition, Multiplication, Division, Exponentiation)

kg#

The knowledge graph to populate

Type:: KnowledgeGraph

precision#

Decimal precision for hash computation

Type:: int

sample_map#

Maps original sample IDs to resolved URIs

Type:: dict

property_map#

Maps user-defined property IDs (from YAML ‘id’ fields on calculated_property / input_parameter / output_parameter entries) to their generated KG URI strings. Built incrementally as workflows and math operations are parsed, so later math_operation entries can reference earlier properties by their local ID.

Type:: dict

debug#

If True, print debug messages during parsing

Type:: bool

hash_threshold#

Skip hashing for samples with more than this many atoms. Set to None to disable hashing completely.

Type:: int or None

from_file(filepath: str | Path) → Dict[str, Any][source]#

Parse workflow data from a YAML or JSON file.

This is a convenience method that calls parse() with a file path.

Parameters:: filepath (str or Path) – Path to YAML or JSON file
Returns:: Parse results dictionary
Return type:: dict
Raises:: ValueError – If file format is not supported (must be .yaml, .yml, or .json)

parse(data: str | Path | Dict[str, Any]) → Dict[str, Any][source]#

Parse complete workflow data structure.

Parameters:

data (str, Path, or dict) – Either a file path (str/Path) to a YAML/JSON file, or a dictionary containing computational_sample, workflow, and/or activity keys

Returns:

Dictionary with the following keys:

’sample_map’ : dict mapping original IDs to URIs
’workflow_uris’ : list of created workflow URIs
’operation_uris’ : list of created operation URIs

Return type:

dict

Raises:

ValueError – If file format is not supported (must be .yaml, .yml, or .json)
TypeError – If data type is not supported

parse_math_operations(math_op_data_list: List[Dict[str, Any]]) → List[str][source]#

Parse math-operation entries (ASMO arithmetic activities).

Each entry must have a type key (one of Subtraction, Addition, Multiplication, Division, Exponentiation). Operands may be local property-ID strings (resolved via self.property_map) or numeric scalars. If the result carries an id field it is registered in property_map so subsequent math_operation entries can use it as an operand.

Parameters:: math_op_data_list (list of dict)
Returns:: List of math-operation activity-ID strings created.
Return type:: list of str
Raises:: ValueError – If the type field is missing or unrecognised.

parse_operations(operation_data_list: List[Dict[str, Any]]) → List[str][source]#

Parse operation data (transformations between samples).

Operations include: DeleteAtom, SubstituteAtom, AddAtom, Rotate, Translate, and Shear.

Parameters:: operation_data_list (list of dict) – List of operation dictionaries. Each must have: - ‘method’: The operation type (e.g., ‘DeleteAtom’, ‘Rotate’) - ‘input_sample’: Sample ID or list of sample IDs - ‘output_sample’: Sample ID or list of sample IDs - Additional method-specific parameters (e.g., rotation_matrix for Rotate)
Returns:: List of operation URIs created
Return type:: list of str
Raises:: ValueError – If operation method is not recognized

parse_samples(sample_data_list: List[Dict[str, Any]]) → Dict[str, str][source]#

Parse computational sample data and add to knowledge graph.

Performs deduplication via hash-based lookup. If a sample with the same hash already exists, reuses the existing URI. If atom_attribute contains a file_path key, the structure file is read via ASE and atoms are resolved before building the sample object.

Parameters:: sample_data_list (list of dict) – List of sample dictionaries
Returns:: Dictionary mapping original sample IDs to resolved URIs
Return type:: dict

parse_workflows(workflow_data_list: List[Dict[str, Any]]) → List[str][source]#

Parse workflow/simulation data and add to knowledge graph.

Resolves sample references using the sample_map.

Parameters:: workflow_data_list (list of dict) – List of workflow dictionaries
Returns:: List of workflow URIs created
Return type:: list of str

atomrdf.io.workflow_parser.from_workflow_input(data: str | Path | Dict[str, Any], graph: KnowledgeGraph | None = None, precision: int = 6) → Dict[str, Any][source]#

Main entry point for parsing workflow data.

Parameters:

data (str, Path, or dict) – File path (str/Path) or dictionary containing workflow data
graph (KnowledgeGraph, optional) – Knowledge graph instance. If None, creates a new one.
precision (int, optional) – Decimal precision for hash computation. Default is 6.

Returns:

Dictionary containing parsed results

Return type:

dict

Raises:

TypeError – If data type is not supported (must be str, Path, or dict)

atomrdf.io.workflow_parser.parse_generic(model_class: type, data: Dict[str, Any], graph: KnowledgeGraph | None = None) → Any[source]#

Generic parser for any Pydantic model with .to_graph() method.

Parameters:

model_class (type) – Pydantic model class
data (dict) – Dictionary to parse
graph (KnowledgeGraph, optional) – Knowledge graph instance. If None, model is created but not added to graph.

Returns:

Instance of the model class

Return type:

Any

atomrdf.io.workflow_parser.parse_operation(operation_data: Dict[str, Any], graph: KnowledgeGraph | None = None) → str[source]#

Parse a single operation.

Parameters:

operation_data (dict) – Operation dictionary with ‘method’ field specifying the operation type
graph (KnowledgeGraph, optional) – Knowledge graph instance. If None, creates a new one.

Returns:

URI of the created operation

Return type:

str or None

atomrdf.io.workflow_parser.parse_sample(sample_data: Dict[str, Any], graph: KnowledgeGraph | None = None) → str[source]#

Parse a single computational sample.

Parameters:

sample_data (dict) – Sample dictionary
graph (KnowledgeGraph, optional) – Knowledge graph instance. If None, creates a new one.

Returns:

URI of the created/found sample

Return type:

str or None

atomrdf.io.workflow_parser.parse_workflow(workflow_data: Dict[str, Any], graph: KnowledgeGraph | None = None) → str[source]#

Parse a single workflow.

Parameters:

workflow_data (dict) – Workflow dictionary
graph (KnowledgeGraph, optional) – Knowledge graph instance. If None, creates a new one.

Returns:

URI of the created workflow

Return type:

str or None

atomrdf.io.workflow_parser.parse_workflow_yaml(yaml_data: Dict[str, Any], kg: KnowledgeGraph, precision: int = 6) → Dict[str, str][source]#

Parse workflow YAML data (backwards-compatible function).

Parameters:

yaml_data (dict) – Dictionary containing workflow data
kg (KnowledgeGraph) – Knowledge graph to populate
precision (int, optional) – Decimal precision for hash computation. Default is 6.

Returns:

Dictionary mapping original sample IDs to URIs

Return type:

dict

API Reference#

Top-level package#

Parameters:#

Raises:#

Notes:#

KnowledgeGraph#

Classes#

Parameters:#

Raises:#

Notes:#

Namespace#

Stores#

Parameters:#

Raises:#

Properties#

Visualisation#

Data models#

Structure#

Activity#

Dataset#

Physical quantities#

Build helpers#

Parameters:#

Returns:#

Workflow parser#

This Page