API Reference#
Structure#
KnowledgeGraph#
Graph module contains the basic RDFGraph object in atomrdf. This object gets a structure as an input and annotates it with the CMSO ontology (PLDO and PODO too as needed). The annotated object is stored in triplets.
Notes
Always add type triples before adding further properties.
Classes#
KnowledgeGraph: Represents a knowledge graph that stores and annotates structure objects.
- - defstyledict
- Type:
A dictionary containing default styles for visualizing the graph.
- class atomrdf.graph.KnowledgeGraph(graph_file=None, store='Memory', store_file=None, identifier='http://default_graph', ontology=None, structure_store=None, enable_log=False)[source]#
Represents a knowledge graph.
- Parameters:
graph_file (str, optional) – The path to the graph file to be parsed. Default is None.
store (str, optional) – The type of store to use. Default is “Memory”.
store_file (str, optional) – The path to the store file. Default is None.
identifier (str, optional) – The identifier for the graph. Default is “http://default_graph”.
ontology (Ontology, optional) – The ontology object to be used. Default is None.
structure_store (StructureStore, optional) – The structure store object to be used. Default is None.
enable_log (bool, optional) – Whether to enable logging. Default is False. If true, a log file named atomrdf.log will be created in the current working directory.
- graph#
The RDF graph.
- Type:
rdflib.Graph
- sgraph#
The structure graph for a single chosen sample
- Type:
rdflib.Graph
- ontology#
The ontology object.
- Type:
Ontology
- terms#
The dictionary of ontology terms.
- Type:
dict
- store#
The type of store used.
- Type:
str
- triples(triple)[source]#
Return the triples in the knowledge graph that match the given triple pattern.
- query(source, destinations=None, return_df=True, num_paths=1, limit=None)[source]#
Execute a SPARQL query on the knowledge graph using tools4RDF.
- get_sample_as_structure(sample_id)[source]#
Retrieve a sample from the graph as an AtomicScaleSample object.
- add(triple)[source]#
Add a triple to the knowledge graph.
- Parameters:
triple (tuple) – The triple to be added in the form (subject, predicate, object).
- archive(package_name, format='turtle', compress=True, add_simulations=False)[source]#
Publish a dataset from graph including per atom quantities.
Parameters:#
- package_namestr
The name of the package to be created.
- formatstr, optional
The format in which the dataset should be written. Default is “turtle”.
- compressbool, optional
Whether to compress the package into a tarball. Default is True.
Raises:#
- ValueError
If the package_name already exists or if the tarball already exists.
Notes:#
This method creates a package containing a dataset from the graph, including per atom quantities. The package consists of a folder named package_name, which contains the dataset and related files. If compress is True, the package is compressed into a tarball.
The method performs the following steps: 1. Checks if the package_name already exists. If it does, raises a ValueError. 2. If compress is True, checks if the tarball already exists. If it does, raises a ValueError. 3. Creates a folder named package_name. 4. Creates a subfolder named rdf_structure_store within the package folder. 5. Copies the files associated with each sample to the rdf_structure_store folder, while fixing the paths. 6. Updates the paths in the graph to point to the copied files. 7. Writes the dataset to a file named “triples” within the package folder. 8. If compress is True, compresses the package folder into a tarball. 9. Removes the package folder.
- close(filename, format='json-ld')[source]#
Close the graph and write to a file
- Parameters:
filename (string) – name of output file
- Return type:
None
- close_store()[source]#
Release the underlying store (close file handles and locks).
This is a no-op for the in-memory store. For file-backed stores (Oxigraph, SQLAlchemy) it releases the file lock so the same store directory can be reopened in the same process or by another process.
- Return type:
None
- create_node(namestring, classtype, label=None)[source]#
Create a new node in the graph.
- Parameters:
namestring (str) – The name of the node.
classtype (Object from a given ontology) – The class type of the node.
- Returns:
The newly created node.
- Return type:
URIRef
- get_sample_as_structure(sample_id)[source]#
Retrieve a sample from the graph as an AtomicScaleSample object.
- Parameters:
sample_id (str or URIRef) – The ID of the sample to retrieve
- Returns:
The sample as an AtomicScaleSample pydantic object
- Return type:
AtomicScaleSample
Examples
>>> kg = KnowledgeGraph() >>> sample = kg.get_sample_as_structure('sample:123') >>> atoms = sample.to_structure() # Convert to ASE Atoms >>> sample.to_file('output.lmp', format='lammps-dump')
- merge_archive(package_name, compress=True, format='turtle')[source]#
Merge an archived dataset into this KnowledgeGraph.
Unlike
unarchive(which creates a new graph), this method loads the triples and structure-store files from an existing archive into the current graph so that multiple datasets can be combined incrementally:kg = KnowledgeGraph() kg.merge_archive("dataset_1_GB.tar.gz") kg.merge_archive("dataset_2_GB.tar.gz") # kg now contains both datasets
- Parameters:
package_name (str) – Path to the archive. When compress is True (default) this should be a
.tar.gzfile; otherwise the name of an already- extracted directory.compress (bool, optional) – Whether package_name is a compressed tarball. Default True.
format (str, optional) – RDF serialisation format of the
triplesfile inside the archive. Default"turtle".
Notes
Structure-store JSON files from the archive are copied into
self.structure_store. UUID-based filenames make collisions extremely unlikely; a warning is emitted if a file already exists and it is silently skipped (the existing copy wins).After parsing, every
CMSO.hasPathtriple that still references the archive-internalrdf_structure_store/prefix is rewritten to point atself.structure_store.
- property n_samples#
Number of samples in the Graph
- purge(force=False)[source]#
Remove all information from the KnowledgeGraph.
- Parameters:
force (bool, optional) – Whether to proceed with purging the graph. Default is False.
- Return type:
None
Notes
This method removes all information from the KnowledgeGraph. If the force parameter is set to False, a warning is issued before proceeding with the purging.
- query(source, destinations=None, return_df=True, num_paths=1, limit=None)[source]#
Execute a SPARQL query on the knowledge graph.
This method supports two query modes: 1. Raw SPARQL query strings (passed as source parameter) 2. Ontology-based queries using tools4RDF (source as OntoTerm)
- Parameters:
source (str or OntoTerm) – If str: Raw SPARQL query string to execute directly. If OntoTerm: The source ontology term from which paths are to be queried. Access terms via self.ontology.terms (e.g., self.ontology.terms.cmso.AtomicScaleSample).
destinations (list of OntoTerm or OntoTerm, optional) – One or more destination ontology terms to which paths are to be queried. Can be a single term or a list of terms. If None, all properties of the source are returned. Only used when source is an OntoTerm.
return_df (bool, default=True) – If True, returns results as a pandas DataFrame. Otherwise, returns raw query results.
num_paths (int, default=1) – The number of paths to retrieve for each query when multiple paths exist. Only used when source is an OntoTerm.
limit (int, optional) – The maximum number of results to return. If None, no limit is applied. Only used when source is an OntoTerm.
- Returns:
If return_df is True, returns a pandas DataFrame with query results. If return_df is False, returns a list of query results. Returns None if no results are found.
- Return type:
pandas.DataFrame or list or None
Examples
Query with raw SPARQL string:
>>> query = ''' ... PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/> ... SELECT DISTINCT ?symbol ... WHERE { ... ?sample cmso:hasNumberOfAtoms ?number . ... ?sample cmso:hasMaterial ?material . ... ?material cmso:hasStructure ?structure . ... ?structure cmso:hasSpaceGroupSymbol ?symbol . ... FILTER (?number="4"^^xsd:integer) ... }''' >>> df = kg.query(query)
Query for all AtomicScaleSamples with their space group symbols:
>>> kg = KnowledgeGraph() >>> df = kg.query( ... kg.ontology.terms.cmso.AtomicScaleSample, ... [kg.ontology.terms.cmso.hasSpaceGroupSymbol] ... )
Query with filters (using == operator on terms):
>>> df = kg.query( ... kg.ontology.terms.cmso.AtomicScaleSample, ... [kg.ontology.terms.cmso.hasNumberOfAtoms == 4] ... )
Notes
When using ontology terms, this method uses tools4RDF to automatically generate SPARQL queries based on the ontology structure. It handles namespace management, path finding between ontology terms, and result formatting automatically.
- remove(triple)[source]#
Remove a triple from the knowledge graph.
- Parameters:
triple (tuple) – The triple to be removed in the form (subject, predicate, object).
- Return type:
None
Notes
This method removes a triple from the knowledge graph. The triple should be provided as a tuple in the form (subject, predicate, object).
Examples
>>> graph = KnowledgeGraph() >>> graph.add(("Alice", "likes", "Bob")) >>> graph.remove(("Alice", "likes", "Bob"))
- property sample_ids#
Returns a list of all Samples in the graph
- property sample_names#
Returns a list of all Sample names in the graph.
- to_file(sample, filename, format='lammps-data', copy_from=None, pseudo_files=None)[source]#
Write a sample structure to a file.
- Parameters:
sample (str or URIRef) – Sample ID
filename (str) – Name of the output file
format (str, optional) – Format of the output file. Default is ‘lammps-data’. Any format supported by ASE can be used.
copy_from (str, optional) – If provided, input options for quantum-espresso format will be copied from the given file. Structure specific information will be replaced. Note that the validity of input file is not checked.
pseudo_files (list, optional) – If provided, add the pseudopotential filenames to file. Should be in alphabetical order of chemical species symbols.
- Return type:
None
Examples
>>> kg = KnowledgeGraph() >>> kg.to_file('sample:123', 'output.lmp', 'lammps-data') >>> kg.to_file('sample:456', 'POSCAR', 'vasp')
- triples(triple)[source]#
Return the triples in the knowledge graph that match the given triple pattern.
- Parameters:
triple (tuple) – The triple pattern to match in the form (subject, predicate, object).
- Returns:
A generator that yields the matching triples.
- Return type:
generator
- classmethod unarchive(package_name, compress=True, store='Memory', store_file=None, identifier='http://default_graph', ontology=None)[source]#
Unarchives a package and returns an instance of the Graph class.
- Parameters:
package_name (str) – The name of the package to unarchive.
compress (bool, optional) – Whether to compress the package. Defaults to True.
store (str, optional) – The type of store to use. Defaults to “Memory”.
store_file (str, optional) – The file to use for the store. Defaults to None.
identifier (str, optional) – The identifier for the graph. Defaults to “http://default_graph”.
ontology (str, optional) – The ontology to use. Defaults to None.
- Returns:
An instance of the Graph class.
- Return type:
Graph
- Raises:
FileNotFoundError – If the package file is not found.
tarfile.TarError – If there is an error while extracting the package.
- value(arg1, arg2)[source]#
Get the value of a triple in the knowledge graph.
- Parameters:
arg1 (object) – The subject of the triple.
arg2 (object) – The predicate of the triple.
- Returns:
The value of the triple if it exists, otherwise None.
- Return type:
object or None
Notes
This method retrieves the value of a triple in the knowledge graph. The triple is specified by providing the subject and predicate as arguments. If the triple exists in the graph, the corresponding value is returned. If the triple does not exist, None is returned.
Examples
>>> graph = KnowledgeGraph() >>> graph.add(("Alice", "likes", "Bob")) >>> value = graph.value("Alice", "likes") >>> print(value) Bob
- visualise(styledict=None, rankdir='BT', hide_types=False, workflow_view=False, sample_view=False, size=None, layout='neato')[source]#
Visualize the RDF tree of the Graph.
- Parameters:
styledict (dict, optional) – If provided, allows customization of color and other properties.
rankdir (str, optional) – The direction of the graph layout. Default is “BT” (bottom to top).
hide_types (bool, optional) – Whether to hide the types in the visualization. Default is False.
workflow_view (bool, optional) – Whether to enable the workflow view. Default is False.
sample_view (bool, optional) – Whether to enable the sample view. Default is False.
size (tuple, optional) – The size of the visualization. Default is None.
layout (str, optional) – The name of the layout algorithm for the graph. Default is “neato”.
- Returns:
The visualization of the RDF tree.
- Return type:
graphviz.dot.Digraph
Notes
The styledict parameter allows customization of the visualization style. It has the following options:
- BNode:
- colorstr
The color of the BNode boxes.
- shapestr
The shape of the BNode boxes.
- stylestr
The style of the BNode boxes.
- URIRef:
- colorstr
The color of the URIRef boxes.
- shapestr
The shape of the URIRef boxes.
- stylestr
The style of the URIRef boxes.
- Literal:
- colorstr
The color of the Literal boxes.
- shapestr
The shape of the Literal boxes.
- stylestr
The style of the Literal boxes.
- visualize(*args, **kwargs)[source]#
Visualizes the graph using the specified arguments.
This method is a wrapper around the visualise method and passes the same arguments to it.
- Parameters:
*args (Variable length argument list.)
**kwargs (Arbitrary keyword arguments.)
- Returns:
dot
- Return type:
The visualization of the RDF tree.
Workflow#
Network#
Namespace#
Stores#
- atomrdf.stores.create_store(kg, store, identifier, store_file=None, structure_store=None)[source]#
Create a store based on the given parameters.
Parameters:#
- kgKnowledgeGraph
The knowledge graph object.
- storestr or Project
The type of store to create. It can be either “Memory”, “SQLAlchemy”, or a pyiron Project object.
- identifierstr
The identifier for the store.
- store_filestr, optional
The file path to store the data (only applicable for certain store types).
- structure_storestr, optional
The structure store to use (only applicable for certain store types).
Raises:#
- ValueError
If an unknown store type is provided.
- atomrdf.stores.store_alchemy(kg, store, identifier, store_file=None, structure_store=None)[source]#
Store the knowledge graph using SQLAlchemy.
- Parameters:
kg (KnowledgeGraph) – The knowledge graph to be stored.
store (str) – The type of store to be used.
identifier (str) – The identifier for the graph.
store_file (str, optional) – The file path for the store. Required if store is not ‘memory’.
structure_store (str, optional) – The structure store to be used.
- Raises:
ValueError – If store_file is None and store is not ‘memory’.
- Return type:
None
- atomrdf.stores.store_memory(kg, store, identifier, store_file=None, structure_store=None)[source]#
Store the knowledge graph in memory.
- Parameters:
kg (KnowledgeGraph) – The knowledge graph to be stored.
store (str) – The type of store to use for storing the graph.
identifier (str) – The identifier for the graph.
store_file (str, optional) – The file to store the graph in. Defaults to None.
structure_store (str, optional) – The structure store to use. Defaults to None.
- Return type:
None
- atomrdf.stores.store_oxigraph(kg, store, identifier, store_file=None, structure_store=None)[source]#
Store the knowledge graph using Oxigraph (via oxrdflib).
- Parameters:
kg (KnowledgeGraph) – The knowledge graph to be stored.
store (str) – The type of store to be used.
identifier (str or URIRef) – The URI identifier for the named graph. Must be consistent across open/reopen calls to retrieve the same triples.
store_file (str, optional) – Directory path for the persistent on-disk Oxigraph store. If None, an in-memory store is used (data is lost when the object is garbage-collected).
structure_store (str, optional) – The structure store to be used.
- Raises:
RuntimeError – If oxrdflib is not installed.
- Return type:
None