API Reference#

Structure#

KnowledgeGraph#

Graph module contains the basic RDFGraph object in atomrdf. This object gets a structure as an input and annotates it with the CMSO ontology (PLDO and PODO too as needed). The annotated object is stored in triplets.

Notes

  • Always add type triples before adding further properties.

Classes#

  • KnowledgeGraph: Represents a knowledge graph that stores and annotates structure objects.

- defstyledict
Type:

A dictionary containing default styles for visualizing the graph.

class atomrdf.graph.KnowledgeGraph(graph_file=None, store='Memory', store_file=None, identifier='http://default_graph', ontology=None, structure_store=None, enable_log=False)[source]#

Represents a knowledge graph.

Parameters:
  • graph_file (str, optional) – The path to the graph file to be parsed. Default is None.

  • store (str, optional) – The type of store to use. Default is “Memory”.

  • store_file (str, optional) – The path to the store file. Default is None.

  • identifier (str, optional) – The identifier for the graph. Default is “http://default_graph”.

  • ontology (Ontology, optional) – The ontology object to be used. Default is None.

  • structure_store (StructureStore, optional) – The structure store object to be used. Default is None.

  • enable_log (bool, optional) – Whether to enable logging. Default is False. If true, a log file named atomrdf.log will be created in the current working directory.

graph#

The RDF graph.

Type:

rdflib.Graph

sgraph#

The structure graph for a single chosen sample

Type:

rdflib.Graph

ontology#

The ontology object.

Type:

Ontology

terms#

The dictionary of ontology terms.

Type:

dict

store#

The type of store used.

Type:

str

add(triple, validate=True)[source]#

Add a triple to the knowledge graph.

triples(triple)[source]#

Return the triples in the knowledge graph that match the given triple pattern.

query(source, destinations=None, return_df=True, num_paths=1, limit=None)[source]#

Execute a SPARQL query on the knowledge graph using tools4RDF.

get_sample_as_structure(sample_id)[source]#

Retrieve a sample from the graph as an AtomicScaleSample object.

add(triple)[source]#

Add a triple to the knowledge graph.

Parameters:

triple (tuple) – The triple to be added in the form (subject, predicate, object).

archive(package_name, format='turtle', compress=True, add_simulations=False)[source]#

Publish a dataset from graph including per atom quantities.

Parameters:#

package_namestr

The name of the package to be created.

formatstr, optional

The format in which the dataset should be written. Default is “turtle”.

compressbool, optional

Whether to compress the package into a tarball. Default is True.

Raises:#

ValueError

If the package_name already exists or if the tarball already exists.

Notes:#

This method creates a package containing a dataset from the graph, including per atom quantities. The package consists of a folder named package_name, which contains the dataset and related files. If compress is True, the package is compressed into a tarball.

The method performs the following steps: 1. Checks if the package_name already exists. If it does, raises a ValueError. 2. If compress is True, checks if the tarball already exists. If it does, raises a ValueError. 3. Creates a folder named package_name. 4. Creates a subfolder named rdf_structure_store within the package folder. 5. Copies the files associated with each sample to the rdf_structure_store folder, while fixing the paths. 6. Updates the paths in the graph to point to the copied files. 7. Writes the dataset to a file named “triples” within the package folder. 8. If compress is True, compresses the package folder into a tarball. 9. Removes the package folder.

close(filename, format='json-ld')[source]#

Close the graph and write to a file

Parameters:

filename (string) – name of output file

Return type:

None

close_store()[source]#

Release the underlying store (close file handles and locks).

This is a no-op for the in-memory store. For file-backed stores (Oxigraph, SQLAlchemy) it releases the file lock so the same store directory can be reopened in the same process or by another process.

Return type:

None

create_node(namestring, classtype, label=None)[source]#

Create a new node in the graph.

Parameters:
  • namestring (str) – The name of the node.

  • classtype (Object from a given ontology) – The class type of the node.

Returns:

The newly created node.

Return type:

URIRef

get_sample_as_structure(sample_id)[source]#

Retrieve a sample from the graph as an AtomicScaleSample object.

Parameters:

sample_id (str or URIRef) – The ID of the sample to retrieve

Returns:

The sample as an AtomicScaleSample pydantic object

Return type:

AtomicScaleSample

Examples

>>> kg = KnowledgeGraph()
>>> sample = kg.get_sample_as_structure('sample:123')
>>> atoms = sample.to_structure()  # Convert to ASE Atoms
>>> sample.to_file('output.lmp', format='lammps-dump')
merge_archive(package_name, compress=True, format='turtle')[source]#

Merge an archived dataset into this KnowledgeGraph.

Unlike unarchive (which creates a new graph), this method loads the triples and structure-store files from an existing archive into the current graph so that multiple datasets can be combined incrementally:

kg = KnowledgeGraph()
kg.merge_archive("dataset_1_GB.tar.gz")
kg.merge_archive("dataset_2_GB.tar.gz")
# kg now contains both datasets
Parameters:
  • package_name (str) – Path to the archive. When compress is True (default) this should be a .tar.gz file; otherwise the name of an already- extracted directory.

  • compress (bool, optional) – Whether package_name is a compressed tarball. Default True.

  • format (str, optional) – RDF serialisation format of the triples file inside the archive. Default "turtle".

Notes

  • Structure-store JSON files from the archive are copied into self.structure_store. UUID-based filenames make collisions extremely unlikely; a warning is emitted if a file already exists and it is silently skipped (the existing copy wins).

  • After parsing, every CMSO.hasPath triple that still references the archive-internal rdf_structure_store/ prefix is rewritten to point at self.structure_store.

property n_samples#

Number of samples in the Graph

purge(force=False)[source]#

Remove all information from the KnowledgeGraph.

Parameters:

force (bool, optional) – Whether to proceed with purging the graph. Default is False.

Return type:

None

Notes

This method removes all information from the KnowledgeGraph. If the force parameter is set to False, a warning is issued before proceeding with the purging.

query(source, destinations=None, return_df=True, num_paths=1, limit=None)[source]#

Execute a SPARQL query on the knowledge graph.

This method supports two query modes: 1. Raw SPARQL query strings (passed as source parameter) 2. Ontology-based queries using tools4RDF (source as OntoTerm)

Parameters:
  • source (str or OntoTerm) – If str: Raw SPARQL query string to execute directly. If OntoTerm: The source ontology term from which paths are to be queried. Access terms via self.ontology.terms (e.g., self.ontology.terms.cmso.AtomicScaleSample).

  • destinations (list of OntoTerm or OntoTerm, optional) – One or more destination ontology terms to which paths are to be queried. Can be a single term or a list of terms. If None, all properties of the source are returned. Only used when source is an OntoTerm.

  • return_df (bool, default=True) – If True, returns results as a pandas DataFrame. Otherwise, returns raw query results.

  • num_paths (int, default=1) – The number of paths to retrieve for each query when multiple paths exist. Only used when source is an OntoTerm.

  • limit (int, optional) – The maximum number of results to return. If None, no limit is applied. Only used when source is an OntoTerm.

Returns:

If return_df is True, returns a pandas DataFrame with query results. If return_df is False, returns a list of query results. Returns None if no results are found.

Return type:

pandas.DataFrame or list or None

Examples

Query with raw SPARQL string:

>>> query = '''
... PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
... SELECT DISTINCT ?symbol
... WHERE {
...     ?sample cmso:hasNumberOfAtoms ?number .
...     ?sample cmso:hasMaterial ?material .
...     ?material cmso:hasStructure ?structure .
...     ?structure cmso:hasSpaceGroupSymbol ?symbol .
... FILTER (?number="4"^^xsd:integer)
... }'''
>>> df = kg.query(query)

Query for all AtomicScaleSamples with their space group symbols:

>>> kg = KnowledgeGraph()
>>> df = kg.query(
...     kg.ontology.terms.cmso.AtomicScaleSample,
...     [kg.ontology.terms.cmso.hasSpaceGroupSymbol]
... )

Query with filters (using == operator on terms):

>>> df = kg.query(
...     kg.ontology.terms.cmso.AtomicScaleSample,
...     [kg.ontology.terms.cmso.hasNumberOfAtoms == 4]
... )

Notes

When using ontology terms, this method uses tools4RDF to automatically generate SPARQL queries based on the ontology structure. It handles namespace management, path finding between ontology terms, and result formatting automatically.

remove(triple)[source]#

Remove a triple from the knowledge graph.

Parameters:

triple (tuple) – The triple to be removed in the form (subject, predicate, object).

Return type:

None

Notes

This method removes a triple from the knowledge graph. The triple should be provided as a tuple in the form (subject, predicate, object).

Examples

>>> graph = KnowledgeGraph()
>>> graph.add(("Alice", "likes", "Bob"))
>>> graph.remove(("Alice", "likes", "Bob"))
property sample_ids#

Returns a list of all Samples in the graph

property sample_names#

Returns a list of all Sample names in the graph.

to_file(sample, filename, format='lammps-data', copy_from=None, pseudo_files=None)[source]#

Write a sample structure to a file.

Parameters:
  • sample (str or URIRef) – Sample ID

  • filename (str) – Name of the output file

  • format (str, optional) – Format of the output file. Default is ‘lammps-data’. Any format supported by ASE can be used.

  • copy_from (str, optional) – If provided, input options for quantum-espresso format will be copied from the given file. Structure specific information will be replaced. Note that the validity of input file is not checked.

  • pseudo_files (list, optional) – If provided, add the pseudopotential filenames to file. Should be in alphabetical order of chemical species symbols.

Return type:

None

Examples

>>> kg = KnowledgeGraph()
>>> kg.to_file('sample:123', 'output.lmp', 'lammps-data')
>>> kg.to_file('sample:456', 'POSCAR', 'vasp')
triples(triple)[source]#

Return the triples in the knowledge graph that match the given triple pattern.

Parameters:

triple (tuple) – The triple pattern to match in the form (subject, predicate, object).

Returns:

A generator that yields the matching triples.

Return type:

generator

classmethod unarchive(package_name, compress=True, store='Memory', store_file=None, identifier='http://default_graph', ontology=None)[source]#

Unarchives a package and returns an instance of the Graph class.

Parameters:
  • package_name (str) – The name of the package to unarchive.

  • compress (bool, optional) – Whether to compress the package. Defaults to True.

  • store (str, optional) – The type of store to use. Defaults to “Memory”.

  • store_file (str, optional) – The file to use for the store. Defaults to None.

  • identifier (str, optional) – The identifier for the graph. Defaults to “http://default_graph”.

  • ontology (str, optional) – The ontology to use. Defaults to None.

Returns:

An instance of the Graph class.

Return type:

Graph

Raises:
  • FileNotFoundError – If the package file is not found.

  • tarfile.TarError – If there is an error while extracting the package.

value(arg1, arg2)[source]#

Get the value of a triple in the knowledge graph.

Parameters:
  • arg1 (object) – The subject of the triple.

  • arg2 (object) – The predicate of the triple.

Returns:

The value of the triple if it exists, otherwise None.

Return type:

object or None

Notes

This method retrieves the value of a triple in the knowledge graph. The triple is specified by providing the subject and predicate as arguments. If the triple exists in the graph, the corresponding value is returned. If the triple does not exist, None is returned.

Examples

>>> graph = KnowledgeGraph()
>>> graph.add(("Alice", "likes", "Bob"))
>>> value = graph.value("Alice", "likes")
>>> print(value)
Bob
visualise(styledict=None, rankdir='BT', hide_types=False, workflow_view=False, sample_view=False, size=None, layout='neato')[source]#

Visualize the RDF tree of the Graph.

Parameters:
  • styledict (dict, optional) – If provided, allows customization of color and other properties.

  • rankdir (str, optional) – The direction of the graph layout. Default is “BT” (bottom to top).

  • hide_types (bool, optional) – Whether to hide the types in the visualization. Default is False.

  • workflow_view (bool, optional) – Whether to enable the workflow view. Default is False.

  • sample_view (bool, optional) – Whether to enable the sample view. Default is False.

  • size (tuple, optional) – The size of the visualization. Default is None.

  • layout (str, optional) – The name of the layout algorithm for the graph. Default is “neato”.

Returns:

The visualization of the RDF tree.

Return type:

graphviz.dot.Digraph

Notes

The styledict parameter allows customization of the visualization style. It has the following options:

BNode:
colorstr

The color of the BNode boxes.

shapestr

The shape of the BNode boxes.

stylestr

The style of the BNode boxes.

URIRef:
colorstr

The color of the URIRef boxes.

shapestr

The shape of the URIRef boxes.

stylestr

The style of the URIRef boxes.

Literal:
colorstr

The color of the Literal boxes.

shapestr

The shape of the Literal boxes.

stylestr

The style of the Literal boxes.

visualize(*args, **kwargs)[source]#

Visualizes the graph using the specified arguments.

This method is a wrapper around the visualise method and passes the same arguments to it.

Parameters:
  • *args (Variable length argument list.)

  • **kwargs (Arbitrary keyword arguments.)

Returns:

dot

Return type:

The visualization of the RDF tree.

write(filename, format='json-ld')[source]#

Write the serialised version of the graph to a file

Parameters:
  • filename (string) – name of output file

  • format (string, {'turtle', 'xml', 'json-ld', 'ntriples', 'n3'}) – output format to be written to

Return type:

None

Workflow#

Network#

Namespace#

Stores#

atomrdf.stores.create_store(kg, store, identifier, store_file=None, structure_store=None)[source]#

Create a store based on the given parameters.

Parameters:#

kgKnowledgeGraph

The knowledge graph object.

storestr or Project

The type of store to create. It can be either “Memory”, “SQLAlchemy”, or a pyiron Project object.

identifierstr

The identifier for the store.

store_filestr, optional

The file path to store the data (only applicable for certain store types).

structure_storestr, optional

The structure store to use (only applicable for certain store types).

Raises:#

ValueError

If an unknown store type is provided.

atomrdf.stores.store_alchemy(kg, store, identifier, store_file=None, structure_store=None)[source]#

Store the knowledge graph using SQLAlchemy.

Parameters:
  • kg (KnowledgeGraph) – The knowledge graph to be stored.

  • store (str) – The type of store to be used.

  • identifier (str) – The identifier for the graph.

  • store_file (str, optional) – The file path for the store. Required if store is not ‘memory’.

  • structure_store (str, optional) – The structure store to be used.

Raises:

ValueError – If store_file is None and store is not ‘memory’.

Return type:

None

atomrdf.stores.store_memory(kg, store, identifier, store_file=None, structure_store=None)[source]#

Store the knowledge graph in memory.

Parameters:
  • kg (KnowledgeGraph) – The knowledge graph to be stored.

  • store (str) – The type of store to use for storing the graph.

  • identifier (str) – The identifier for the graph.

  • store_file (str, optional) – The file to store the graph in. Defaults to None.

  • structure_store (str, optional) – The structure store to use. Defaults to None.

Return type:

None

atomrdf.stores.store_oxigraph(kg, store, identifier, store_file=None, structure_store=None)[source]#

Store the knowledge graph using Oxigraph (via oxrdflib).

Parameters:
  • kg (KnowledgeGraph) – The knowledge graph to be stored.

  • store (str) – The type of store to be used.

  • identifier (str or URIRef) – The URI identifier for the named graph. Must be consistent across open/reopen calls to retrieve the same triples.

  • store_file (str, optional) – Directory path for the persistent on-disk Oxigraph store. If None, an in-memory store is used (data is lost when the object is garbage-collected).

  • structure_store (str, optional) – The structure store to be used.

Raises:

RuntimeError – If oxrdflib is not installed.

Return type:

None