SPARQL queries against the atomRDF graph#

Every sample, simulation cell and crystallographic property created by atomRDF is a real RDF triple, so anything you can express in SPARQL is available to you. This notebook shows three styles of querying:

  1. Raw SPARQL (kg.query("SELECT ...")).

  2. Term-builder for ontology-aware queries without writing SPARQL (kg.query_sample, kg.query).

  3. Returning a sample object and operating on it.

from atomrdf import KnowledgeGraph
import atomrdf.build as build
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 from atomrdf import KnowledgeGraph
      2 import atomrdf.build as build

File ~/checkouts/readthedocs.org/user_builds/pyscal-rdf/conda/latest/lib/python3.11/site-packages/atomrdf/__init__.py:21
      1 """atomRDF — ontology-based knowledge graphs for atomistic simulation data.
      2 
      3 atomRDF combines `pyscal3 <https://github.com/pyscal/pyscal3>`_,
   (...)     17 documentation at https://atomrdf.pyscal.org.
     18 """
     20 from atomrdf._version import __version__
---> 21 from atomrdf.graph import KnowledgeGraph
     22 from atomrdf.io.workflow_parser import WorkflowParser
     24 __all__ = [
     25     "__version__",
     26     "KnowledgeGraph",
     27     "WorkflowParser",
     28 ]

File ~/checkouts/readthedocs.org/user_builds/pyscal-rdf/conda/latest/lib/python3.11/site-packages/atomrdf/graph.py:46
     44 from atomrdf.stores import create_store, purge
     45 import atomrdf.json_io as json_io
---> 46 import atomrdf.mp as amp
     49 from atomrdf.namespace import (
     50     CMSO,
     51     PLDO,
   (...)     56     Literal,
     57 )
     59 # read element data file

File ~/checkouts/readthedocs.org/user_builds/pyscal-rdf/conda/latest/lib/python3.11/site-packages/atomrdf/mp.py:5
      1 """
      2 Wrapper around Materials Project to query structures and get it as a KG
      3 """
----> 5 from mp_api.client import MPRester
      6 import numpy as np
      8 def query_mp(api_key, chemical_system=None, material_ids=None, is_stable=True):

ModuleNotFoundError: No module named 'mp_api'

Build a small heterogeneous database#

kg = KnowledgeGraph()
_ = build.bulk("Fe", cubic=True, graph=kg)
_ = build.bulk("Cu", cubic=True, graph=kg)
_ = build.bulk("Si", cubic=True, graph=kg)
_ = build.bulk("Mg", crystalstructure="hcp", graph=kg)
kg.n_samples

1. Raw SPARQL#

What are the chemical species in the graph?

q = """
PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
SELECT DISTINCT ?symbol
WHERE {
    ?species cmso:hasElementSymbol ?symbol .
}
"""
kg.query(q)

Every sample with a cubic Bravais lattice and exactly two atoms in the unit cell:

q = """
PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>
SELECT ?sample ?symbol
WHERE {
    ?sample  cmso:hasNumberOfAtoms ?n ;
             cmso:hasMaterial      ?m .
    ?m       cmso:hasStructure     ?s .
    ?s       cmso:hasSpaceGroupSymbol ?symbol .
    FILTER (?n = "2"^^xsd:integer)
}
"""
kg.query(q)

2. Term builder (when the ontology network is available)#

kg.terms.cmso.AtomicScaleSample lets you express the same query without typing SPARQL. It requires the ontology network to be reachable at construction time — if it is not (e.g. behind a strict firewall), kg.terms will be None and you should fall back to the raw SPARQL form above.

kg.query(
    kg.terms.cmso.AtomicScaleSample,
    [
        kg.terms.cmso.hasSpaceGroupSymbol,
        kg.terms.cmso.hasNumberOfAtoms == 2,
    ],
)

3. Return a single sample and write it out#

q = """
PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>
SELECT ?sample
WHERE {
    ?sample cmso:hasNumberOfAtoms ?n .
    FILTER (?n = "2"^^xsd:integer)
}
"""
df = kg.query(q)
df
sample = df['sample'].values[0]
kg.to_file(sample, "selected.poscar", format="vasp")
! head -10 selected.poscar