DClite4G

Aus Geoinformation HSR
Zur Navigation springen Zur Suche springen

This is a mirror page of the old 'Dublin Core lite for Geo' (DClite4G) model (the official version can be found at http://tinyurl.com/kfkyv).

 Now DClite4G stands for an enhanced model, which can be found here. See also 'Why_DCLite4G?' and 
 this essay from Jo Walsh (OSGeo).

Dublin Core light for Geo (DClite4G or DC-Simple)

Some design considerations:

  • This is a minimal metadata information model regarding to a metadata exchange protocol for harvesting (e.g. no filter nor GML implementation needed) and according to the ideas about a Simple Catalog Interface protocol.
  • Based on Dublin Core (DC) and Catalogue Services Specification 2.0.1, OGC 04-021r3, p.22.
  • Dublin Core need refined semantics of some properties/attributes.
  • Have had hard times with the abundance use of namespaces. This is because DC specs and other XML 'practices' specialize properties/attribute types instead of specializing whole classes.
  • All properties/attributes have cardinality [0..1] except for identifiers (which are mandatory) and for those attributes which are really needed (as unbounded) for automation!
  • Take all information one can in an automated manner, e.g. from data set resource.

Details:

  • Services are included in attribute 'format' in the sense that WMS, etc. are just protocol bindings to geodata. Real well known services on it's own like filter or label placement services have a place there too. They could be still detected by challenging them with GetCatabilites (taken from OWS/WxS).
  • Indicating of quality of service could be a nice task for search service provider; no need to add it as attribute
  • Relationships between features is part of schema metadata: How to handle this...?


Attributes

  • Aligned and rearranged after some discussions on osgeodata-list and geotools-devel-list. Next steps: first consensus on approach/fields, then consensus about which encoding (DC, RDF or GeoRSS?). - Stefan 08:33, 26 September 2006 (CEST)
Figure
Analysis UML class diagram of minimal metadata information model 'Dublin Core lite for Geo' (DClite4G) which consists of a single entity set (green); two entities/instances/records are shown, a 'dataset' (left, grey) and a 'service' (right, red).
Class diagram of 'Dublin Core lite for Geo' (DClite4G)

Dublin Core lite for Geo (DClite4G or DC-Simple) (Mandatory subset of DC elements plus georesource relationships, some XML content exept dc:identifier may be null/empty):

Attr. name Card. Attr. type Explanation Equal to iso19115/128? Status
dc:identifier [1] URI A unique identifier which unambigiously identifies an item within a given context (i.e. a catalogue/repository, a model and a language) used for extracting and referencing metadata. Note that still several records can co-exist with the same dc:identifier either in different sets implementing different metadata models or because of multiple instances of records using different languages in title, etc. Note also that this identifier is not necessary the identifier of a resource. To access the resource associated with a metadata record, one must use dct:hasPart and dclite4g:onLineSrc (see also OAI-PMH and GeoTools) iso19128:Identifier, iso19115: CI_Citation.identifier, else up to (meta-)data manager Ok?
dc:title [0..1] string (XML encoded) A name given to the resource (regardless e.g. of naming restrictions of file systems). If a layer or file name exist in addition to a title this can be appended with // (tbd! proposal) and concatenated at end of this title string. iso19128:title, iso19115: CI_Citation.title Ok
dct:abstract [0..1] URI or string (XML encoded) A description of the resource. Either a URI reference to a human readable resource or a string. In case of dc:type 'service' (OWS) its the abstract (string) from of GetCapabilities document (GeoTools ServiceInfo has two fields: abstract and description...?) Subtype of dc:description; we use this as WMS, csw:record and ISO 19115 are using it. iso19128:abstract, iso19115: MD_DataIdentification.abstract Ok
dc:type [0..1] enumeration The nature or genre of the content of the resource. 'dataset' or 'service' else 'document, 'text', 'image', 'sound', etc. (see DCMI type (controlled) vocabulary). (Not yet defined for Web Processing Service which have no specific georesource they are responsible for. What about special documents like OGC:SLD, OGC:WMC etc.?) if WxS/OWS then 'service' else 'dataset' or another media type, up to (meta-)data manager Ok
dc:format [0..1] URI A machine readable reference to either a internet media (mime) type (recommended) or a an xml schema namespace URI linking to an namespace authority. In case of dc:type='service' multiple schemas are to expect: insert them comma-separated? 19128:AuthorityURL, iso19115: AuthorityURL Ok
dct:spatial [0..1] dcmiBox:Box with CRS Subtype of dc:coverage. (CRS=WGS84; other possible values offered by dct: Point and TGN) iso19128:EX_BoundingBox, iso19115: EX_BoundingBox Ok
dct:modified [0..1] date Date of last (published) change of resource (see W3C Encoding rules). Subtype of dc:date Timestamp of resource; GetCapabilities Ok
dc:subject [0..1] string (ASCII) A keyword list; e.g. ISO 19115 classification list. (clarification needed: separated by comma) KeywordList, iso19115: MD_TopicCategoryCode Ok
dclite4g: onLineSrc [0..*] URI If dc:type='service' then this is the baseURL to it. If dc:type='dataset' it's either a dc:identifier which points to a metadata record of type 'service' or a dataset URL e.g. uri:ftp:// host.com/path/filename. Subtype of dc:relation. Note that in some implementations this seems redundant to dc:identifier but in fact this is the real access point of the resource. if dc:type='dataset' given by provider, if dc:type='service' iso19128:OnlineResource Ok?
dct:hasPart [0..*] URI A dc:identifier. Only applicable if type 'service'; N/A for type 'dataset'. A dc:identifier which points to a metadata record of dc:type 'dataset' for which it is 'responsible' for. Subtype of dc:relation. Note that in some implementations this seems redundant to dc:identifier but in fact this is the real access point the resource associated with a metadata record. iso19128:DataURL, iso19115: MD_metadata.dataSetURI Ok?
dc:source [0..1] URI or string (XML encoded) Description from which the present resource is derived, lineage information. Either an URI reference to a human readable resource or a string. (Note: Server base URLs and file URIs are handled elsewhere) Up to (meta-)data owner Ok
dc:publisher [0..1] string (XML encoded) Civic Address: if type service derived from GetCapabilities ('ContactInformation') else flattened KML style (= xAL) (still awaiting consensus in other OGC specs. or GeoRSS) Title element of 19128:Attribution; OrganisationName element of 19115:CI_ResponsibleParty Ok
dc:language [0..1] enumeration RFC 1766 (ISO 639, followed optionally by country ISO 3166). Max. Cardinality is intentionally 1 which for the sake of simplicity means that translations of a metadata record in different languages is recorded in several metadata instances and not in multiple dc:language, dc:title or dct:abstract occurrences. Up to (meta-)data manager, WxS spec. do foresee this Ok
dc:rights [0..1] URI or string (XML encoded) License information link about the resource. Probably also (link to) disclaimers. Either an URI reference to a human readable resource or a string (machine readable licenses is an issue; this could be an starting point. Up to (meta-)data owner Ok
dc:relation [0..*] URI A reference to a related resource. DClite4G uses following subtypes: dc:hasPart and iso19115: onLineSrc (see above). If using DC unqualified dc:relation could be used. This could be a machine readable reference to other metadata providers in order to let discover other (meta) data providers. Note that OAI-PMH has such a relationship called 'friends' but on the metadata collection/set level. ------------------------------------------------------------------------------- Placeholder; should be refined in a specific model; meaning up to (meta-)data owner Ok?

Legend: 'Equal to' means possible to derive from iso19128 (= WxS GetCapabilities).

Remarks:

  • General:
    • DC attributes/properties left as they are: dc:Audience; dc:Contributor; dc:Creator.
    • All attributes/properties have at most cardinality 1 except iso19115: onLineSrc and dct:hasPart (and dc:relation from complementary part of DClite4G).
    • Depending on the modeling approach, even these elements can become cardinality 1. NOTE that datasets (geodata resources) and services (data access services) in principle have a many-to-many relationship: Here a geodata resource (dc:type dataset) can have many iso19115: onLineSrc elements and a dc:type data_access_service has only one dc:description which can be a GetCapabilities document.
    • No additinal DC attributes/properties required; few of them need to be specialized (see dct:...);
    • See for some general explanations about dc/dct here.
    • Assume metadata (as opposite to many geodata sets) is always free and open information, like Creative Commons Share Alike
    • An encoding still has to be discussed (see following example). need schemaLocation in OSGeo!?
  • Details:
    • GetCapabilities adds following attributes (not yet modeled here): Fees, ScaleHint and Style.
    • Sorted out or highly disputed (non-DC) elements: fees, scalehint, harvestinterval.
    • dct:modified and dct:spatial can be sync'ed from dataset.
    • Attribute 'relation': This was'nt discussed yet. Simply helps harvesters to discover more (meta) data providers.
    • Attribute 'publisher': Carl mentioned such a structure here which includes StreetAddress, addressee, primaryAddressNumber, streetName, city, state, zipCode, countryCode (like in KML and behind Google geocoding service!?)
    • Keywords is included in attribute 'dc:subject'; I think people have a hard time to agree on an enumerated list (see the success of folksonomy).
  • Note that OAM-PMH...
    • puts a XML envelope around this metadata and adds a header containing two attributes: 'identifier' to identify an metadata record and 'datestamp' as date of last (published) change of metadata record.
    • requires to define a name for metadata sets. Let's don't care about this yet.

Guidelines for a minimal OAI-PMH implementation

OAI-PMH means Open Archives Initiative Protocol for Metadata Harvesting. For an introduction to OAI-PMH 2 see here.

This is a draft implementation guideline for a minimal OAI-PMH implementation for geospatial resources which contains following five steps:

Following are more specific guidelines for a minimal OAI-PMH implementation of a so called 'data provider' using only the mandatory 'unqualified' Dublin Core (DC):

  • Three operations (verbs) are needed: Identify, ListMetadataFormats and ListRecords.
  • Following operations are not required (initially): ListIdentifiers, ListSets, GetRecord.
  • No incremental harvesting (resumption process for ListXxx operations with more than 1000 records)
  • No compression as defined in the OAI-PMH spec. (compression at lower http level still possible)
  • Date granularity may be 'day' not seconds (YYYY-MM-DD)
  • Keeping track of deleted record may not be supported (deletedRecord=no)
  • Mandatory DC supported as data model is sufficient for a start but with specific semantics (e.g. coverage, relation) (see also example below):
    • dc:description contains dct:abstract
    • dc:coverage contains bounding box encoding as defined in http://georss.org/simple.html#Box
    • dc:date means in fact dct:modified
    • dc:relation is filled in with dclite4g:onLineSrc. If dc:type='service' dct:hasPart can be derived from GetCapabilities.

Examples

Some examples of DClite4G/DC-Simple instances/records: (legend: 'literal' is a constant, // is a comment) (http://tinyurl.com/eaaaj)

A web mapping service instance example derived from GetCapabilities: Mapping OGC:WxS GetCapabilities to a service instance. Can also be called a 'data access point' (Note GetCapabilities finally needs to be delivered by some service owner!):

 dc:identifier      = baseURI of the service // in fact just an id to identify a metadata record
 dc:title           = WxS Service/Title
 dct:abstract       = WxS Service/Abstract
 dc:type            = 'service'
 dc:format          = namespace to OGC:WxS schema.xsd
 dct:spatial        = Root BoundingBox // from Capabilities XML
 dct:modified       = timestamp // e.g. from HTTP header or updateSequence
 dc:subject         = WxS Service/KeywordList
 dclite4g:onLineSrc = baseURI of WxS // seems redundant to id but is the real link to the service
 dct:hasPart        = a dc:identifier which points to each Layer element
 dct:hasPart        = another dc:identifier, etc.
 dc:source          = null // N/A. Note: Not meant as an OnlineResource
 dc:publisher       = WxS Service ContactInformation/Organization
 dc:language        = maybe HTTP header for lang (ISO3166 code), soon supported by WMS
 dc:rights          = WxS Services Fees/AccessConstraints
 dc:relation        = definition up to metadata provider

A dataset example derived from GetCapabilities: Mapping OGC:WxS GetCapabilities to a dataset/georesource/data access point (Note: There is no hasPart relationship in datasets):

 dc:identifier      = 'data access point' to dataset (baseURI plus Layername)
 dc:title           = WxS Layer/Title
 dct:abstract       = WxS Layer/Abstract
 dc:type            = 'dataset' 
 dc:format          = namespace to format
 dct:spatial        = BoundingBox from Layer/BoundingBox
 dct:modified       = timestamp from HTTP header maybe or updateSequence
 dc:subject         = Layer/KeywordList
 dclite4g:onLineSrc = a baseURI of WxS // seems redundant to id but is the real 'data access point'
 dclite4g:onLineSrc = another baseURI of WxS // another dataset binding
 dc:source          = null // Note: Not meant as an OnlineResource
 dc:publisher       = from Service if available?
 dc:language        = from Service if available?
 dc:rights          = from Service if available?
 dc:relation        = definition up to metadata provider

A dataset example delivered by a dataset owner:

 dc:identifier      = a data access point defined by dataset owners context
 dc:title           = entered by dataset owner
 dct:abstract       = entered by dataset owner
 dc:type            = 'dataset' 
 dc:format          = namespace to format entered by dataset owner
 dct:spatial        = BoundingBox from data warehouse
 dct:modified       = file timestamp from data warehouse
 dc:subject         = keyword list entered by dataset owner
 dclite4g:onLineSrc = a http URI entered by dataset owner // dataset binding
 dclite4g:onLineSrc = a baseURI of a WxS entered by dataset owner // another binding
 dc:source          = entered by dataset owner // Note: Not online resource
 dc:publisher       = entered by dataset owner
 dc:language        = entered by dataset owner (or derived?)
 dc:rights          = entered by dataset owner
 dc:relation        = definition up to metadata provider

A dataset example with OAI-PMH XML encoding in DClite4G format:

Notes:

  • Example values are only for explanation purposes and purely fictive.
  • XML Schema (= geometadc.xsd? or dclite4g.xsd?) still tbd.
  • This record is not yet validated!
  • Took 'dclite4g' as envelope name.
 <dclite4g:qualifieddc 
   xmlns:dclite4g="http://www.osgeo.org/schemas/dclite4g/0.01" 
   xmlns:dc="http://purl.org/dc/elements/1.1/" 
   xmlns:dct="http://purl.org/dc/terms/" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="http://www.osgeo.org/schemas/dclite4g/ dclite4g.xsd">
 
   <dc:identifier>www.osgeo.org/geodata/:f264-77d2-09ce-aa39-f0f0</dc:identifier>
   <dc:title>National Elevation Mapping Service for Texas</dc:title>
   <dct:abstract>Elevation data collected for the National Elevation 
                 Dataset (NED).</dct:abstract>
   <dc:type>dataset</dc:type>
   <dc:format>...uri to the schema of the information model 
              (xsd, realxng, schematron, ili, ...)</dc:format>
   <dct:spatial>
     <Box projection="EPSG:4326" name="Geographic">
       <northlimit>34.353</northlimit>
       <eastlimit>-96.223</eastlimit>
       <southlimit>28.229</southlimit>
       <westlimit>-108.44</westlimit>
     </Box>
   </dct:spatial>
   <dct:modified>2004-03-01</dct:modified>
   <dc:subject>Elevation, Hypsography, and Contours</dc:subject>
   <dclite4g:onLineSrc>uri:http://www.osgeo.org/geodata/ned_grid_georss.xml</dclite4g:onLineSrc>
   <dclite4g:onLineSrc>uri:http://www.osgeo.org/services/wms/</dclite4g:onLineSrc>
   <dclite4g:onLineSrc>uri:http://www.osgeo.org/geodata/ned_grid.shp</dclite4g:onLineSrc>
   <dc:source>Lineage: Based on 30m horizontal and 15m vertical accuracy.</dc:source>
 
   <dc:publisher>U.S. Geological Survey</dc:publisher>
   <dc:language>en</dc:language>
   <dc:rights>uri:http://www.usgs.gov/pubprod/</dc:rights>
 </dclite4g:qualifieddc>

Same record example as before but with unqualified DC encoding:

Note that this unqualified DC record can be seen as a mapping from DClite4G by using it's well defined semantics and content 'encoding'. See explanations above to understand the semantics of these DC-elements:

 ...
 <dc:identifier>www.osgeo.org/geodata/:f264-77d2-09ce-aa39-f0f0</dc:identifier>
 <dc:title>National Elevation Mapping Service for Texas</dc:title>
 <dc:description>Elevation data collected for the National Elevation 
                 Dataset (NED).</dc:description>
 <dc:type>dataset</dc:type>
 <dc:format>...uri to the schema of the information model 
            (xsd, realxng, schematron, ili, ...)</dc:format>
 <dc:coverage>34.353 -96.223 28.229 -108.44</dc:coverage>
 <dc:date>2004-03-01</dc:date>
 <dc:subject>Elevation, Hypsography, and Contours</dc:subject>
 <dc:relation>uri:http://www.osgeo.org/services/wms/</dc:relation>
 <dc:source>Lineage: Based on 30m horizontal and 15m vertical accuracy.</dc:source>
 ...

Other Relevant Info

  • Simple_Catalog_Interface
  • OSGeodata on GISpunkt Wiki - These pages are about the search of an open, lean and mean "protocol for the incremental exchange of metadata about geographic resources between systems". Profiled specifications like WFS or OAI-PMH are currently on our short list. Delving into 'Open Archives Initiative Protocol for Metadata Harvesting' (OAI-PMH) is strongly encouraged. It's a low barrier interoperability specification based around metadata harvesting model, it's stable (subsequent revisions are backwards compatible) and uses unqualified Dublin Core as default metadata information model; there exist open source tools (like OAICat) and it has been adopted among others by Google and Yahoo! but it's not a search protocol.
  • See here a comparison between CSW, WFS and OAI-PMH.

Weblinks

  • GM03 - Swiss Profile of ISO 19115.