REVIEW OF SPATIAL DATA BASES

Prof. Niranjan N Chiplunkar, NMAMIT,Nitte

Abstract : The purpose of a spatial database system is to manage data about some space in two or more dimensions. This paper reviews some known aspects of spatial data bases including the data types, data model, query language, spatial DBMS implementation issue like spatial indexing and spatial DBMS architecture. Spatial database system offer the underlying database technology for applications like geographic information systems (GIS), urban planning,meteorology,astronomy,VLSI design etc.

Introduction

In various fields, there is a need to manage geometric, geographic or spatial data(ie., data related to space). The space of interest can be, for example, the 2-D abstraction of (parts of) the earth’s surface (ie., geographic space, the most prominent example). Other examples are a man-made space (eg.Layout of VLSI design), a volume containing a model of the human brain, or a 3-D space representing the arrangement of chains of protein molecules. Actually empty space is not very interesting, so one really has to deal with collections of objrcts arranged in space. A Spatial database may record information about objects with a clear identity, position and extent within the space. On the other hand, for some physical spaces it is possible to produce 2-D or 3-D raster images of the space(eg.By remote sensing or Computer tomography), and a database system may be able to manage the images. Image analysis or feature extraction techniques can be used to move from a database of images to the spatial databases. Since the requirements and techniques for dealing with the two kinds of information are rather different, it makes sense to distinguish two classes of systems called spatial database system and image database systems respectively.

Spatial data management imposes a number of requirements on database systems. A basic requirement is that adequate concepts are needed to describe the shape and position of objects in space as well as relationship between them. This can be fulfilled by introducing spatial data types or spatial algebras, which provide types such as point, line or region in the 2-D case. Data modeling much also provide concepts for the representation of spatially related collections of objects, for example, a partition of a country into districts, or a highway network. It must be possible to access efficiently objects lying in a particular region of the space, which leads to a need for specialized spatial access methods or index structures. It also must be possible to follow spatial relationships between objects efficiently, which results in a need for specialized spatial join methods. Spatial data call for a graphical representation of query results at the user interface; additionally, the user interface must support the combination(overlaying) of the results of several queries, since the purpose of querying is often to build a tailored picture of the space of interest. Image data management leads to additional requirements, such as efficient access to parts of large multidimensional array, image manipulation operations, interfacing to image analysis packages, the extraction of indexing information for image retrieval by content, and so forth.

Spatial modeling

The objects in the space and the space itself need to be represented using some models. For modeling single objects, the fundamental abstractions are point, line and region. The two most important instances of spatially related collections of objects are partitions and networks. A partition can be viewed as a set of region objects that are required to be disjoint. Partitions can be used to represent thematic maps. A network can be viewed as a graph embedded in the plane, consisting of a set of point objects, forming it’s nodes and a set of line objects describing the geometry of the edges. Networks are ubiquitous in geography (eg. Highways, rivers, power supply lines etc.)

Spatial data types and operation

Systems of spatial data types or spatial algebras can capture the fundamental abstractions for point, line and region, together with relationships between them and operations for composition. ROSE algebra is an example of spatial systems. ROSE algebra offers three data types called points, lines and regions. The type system of the ROSE allows one to describe polymorphic operations by quantification over kinds. Two such kinds are EXT ={lines,regions} and GEO={points,lines,regions}. There are four classes of operations.

(i) Spatial predicates expressing topological relationships : Eg."geo in GEO and "ext1,ext2 in EXT,

geo X regions à bool inside

ext1 X ext2 à bool intersects,meets

(ii) Operations returning atomic spatial data type values : Eg: "geo in GEO

lines X lines à points intersection

regions X regions à regions intersection

geo X geo à geo plus,minus

regions à lines contour

(iii) Spatial operators returning numbers : Eg: " geo1,geo2 in GEO

geo1 X geo2 à real dist

regions à real perimeter,area

(iv) Spatial operations on sets of objects : Eg: "obj in OBJ, " geo1,geo2,geo in GEO

set(obj)X(obj àgeo)à geo sum

set(obj)X(objàgeo1)Xgeo2àset(obj)closest

Spatial Relationships

These make it possible to ask for all objects in a given relationship with a query object. There are several classes of relationships , like

(i) Topological relationships, such as adjacent, inside and disjoint. (ii) Direction relationships, such as above, below, north-of etc. (iii) Metric relationships, such as “distance<100”,”area>=1 sq.mtr”

Querying

The problem of querying is to connect the operations of a spatial algebra (including predicates to express spatial relationships) to the facilities of a DBMS query language. But there are also other aspects that mainly have to do with the fact that spatial data require a graphical presentation of results as well as graphical input of queries or at least SDT (Spatial Data Type) values used in queries. Following subsections consider the fundamental operations needed at the level of manipulating sets of database objects, graphical input and output, and techniques and requirements for extending query languages.

a) Fundamental Operations (Algebra) : From an algebric point of view, operations for manipulating sets of database objects with spatial attributes can be classified as spatial selection, spatial join, spatial function application and other set operations.

Selection is an operation that returns from a set of objects those that fulfil a predicate. Some examples of spatial selection are :

“Find all cities in Dakshina Kannada”(This assuming that Dakshina Kannada exists as a Region value, and inside is available in the spatial algebra): cities select[center inside Dakshina Kannada].

“Find all big cities no more than 50 kms from Udupi “ (Udupi being a POINT value): cities select [dist(center,Udupi)<50 and population>1,00,000]

Spatial join is a join that compares any two objects with a predicate according to their spatial attributes values. Some examples are :

“Combine cities with their states” : cities states join [center inside area]

“For each river, find all cities within less than 50kms” : cities rivers join[dist(center,route)<50]

Spatial Function Application : This indicates how operations of a spatial algebra that compute new SDT values can be used in a query. For example : “For each river going through Udupi, return the name, the part of its geometry lying inside Udupi, and the length of that part” : rivers select[route intersects Udupi]

extend[intersection(route,Udupi){part}]

extend [length(part){plength}]project[rname,part,plength]

Other set operations : These operations manipulate whole sets of spatial objects in a special way; they lie at the interface between a spatial and the DBMS object algebra. Some examples are : overlay,Fusion etc.,

b) Graphical Input and output: Traditional database systems deal with alphanumeric data types whose values can easily be entered through a keyboard and represented textually within a query result. For spatial database system, at least when it is to be used interactively, graphical presentation of SDT(Spatial Data Type) values in query results is essential. It is also important to enter SDT values to be used as “constants” in queries via a graphical input device. Another distinctive characteristic of querying a spatial database is that the goal of querying is in general to obtain a “tailored” picture of the space represented in the database. This means that the information to be retrieved is often not the result of a single query but rather a combination of several queries. These can be fulfilled by offering textual commands in the query language or within the design of a graphical user interface (GUI). A GUI will probably have at least three subwindows : (1) a text window for displaying the textual representation of a collection of objects, containing the alphanumeric attributes of each object, (2) a graphics window containing the overlay of the graphical representations of spatial attributes of several object classes or query results and (3) a text window for entering queries and perhaps displaying system messages.

c) Techniques and requirements for extending query languages : Integrating geometry into a query language has three main aspects :

1) Denoting SDT values as constants in a query and graphical input of such constants : A suggested technique is to use a special keyword within a query such as PICK; parsing the query will lead to an interaction that allows the user to graphically enter the values, for example; select sname from cities where center inside PICK.

2) Expressing the 4 classes of fundamental operations for an embedded spatial algebra : Spatial selection and spatial join operations can be expressed directly in the query language. Spatial function application, although not possible in classical relational algebra, is also in practice provided by query languages. In contrast, the expression of other set operations of a spatial algebra does not fit into the select … from … where (SFW) paradigm. An “object model interface” has been developed by Guting and Scheneider (1993b) which describes some syntactic facilities required to accommodate a spatial algebra completely.

3) Describing the presentation of results. The presentation language needs some embedded general querying capabilities to determine subsets of answers to be shown in specific formats.

Issues in Spatial DBMS implementation

The first issue in the implementation of spatial algebra is that it should be integrated into a database system’s query processing. This means, first of all, that we have to provide representations for the algebra’s types as well as algorithms/procedures for its operations. However it does not suffice just to implement atomic operations efficiently such as a test whether two regions intersect. It is also necessary to consider the use of such predicates within set-oriented query processing(ie., when they occur within a spatial selection or spatial join). Here spatial access methods and spatial join algorithms come into play.

The representation of a value of a spatial data type(eg.A region) has to be simultaneously compatible with two different views, namely, the view of the database system and the view of the spatial algebra. To fulfil the DBMS requirements , the representation must be paged data structure compatible with the DBMS support for long fields or large attribute values. To support efficient loading and storing on disk, it should consist of single contiguous byte block, as long as it is small enough to fit into one page. Otherwise, it can be a large byte block cut into page sized pieces. When a value representation happens to be large, a good strategy is to split it into a small info part, which will contain often used summary information about the value, and an exact geometry part, representing for example, the long sequence of vertices, so that it is possible to load only the info part into a DBMS buffer. For spatial data types, generic approximations may be needed to interface with spatial access methods: for example, each data type must provide access to a bounding box(also called minimum bounding rectangle (MBR)). From both the spatial algebra and the programming language point of view, the representation should be such that it is mapped by the compiler into a single or perhaps a few contiguous areas. For example, it can be defined as a pointer to a record with several fixed size components and a very large array(for the exact geometry) at the end; then one can dynamically allocate the right amount of space for a given value.

The spatial indexing is another important aspect of spatial DBMS implementation. The main purpose of spatial indexing is to support spatial selection, that is to retrieve from a large set of spatial objects (objects with an SDT attribute) those objects in some particular relationship with a query SDT value. A spatial indexing method organizes space and the objects in it so that only parts of the space and a subset of the objects need to be considered. There are two ways to provide spatial indexing : (1) dedicated external spatial data structures are added to the system offering for spatial attributes what a B-tree does for standard attributes, and (2) spatial objects are mapped into a 1-D space so that they can be stored within a standard 1-D index such as a B-tree. Apart from spatial selection, spatial indexing supports also other operations such as spatial join, finding the object closest to a query value etc.

A fundamental idea for spatial indexing and, in fact for all spatial query processing, is the use of approximations. This allows index structures to manage an object in terms of one or more spatial keys, which are much simpler geometric objects than the SDT value itself. A continuous approximation is based on the coordinates of the SDT value itself. The prime example is the bounding box. For grid approximations, space is divided into cells by a regular grid and the SDT value is represented by the set of cells that it intersects. First based on the approximations, a filtering step is executed; it returns a set of candidates that is a superset of the objects fulfilling a predicate. Second, for each candidate in a refinement step the exact geometry is checked. This strategy has more recently been extended to include a second filtering step where more precise approximations of the candidate objects are checked. Due to the use of bounding boxes, most spatial data structures are designed to store either a set of points or a set of rectangles. Apart from operations like insert, delete and member to manage such sets, for stored points, some important queries are : Range query, Nearest neighbor and Distance scan; for rectangles, the queries are Intersection query & Containment query.

A spatial index structure organize objects within a set of buckets. Each bucket has an associated bucket region, a part of space containing all objects stored in the bucket. Like index structures for standard attributes, the structure can be clustering or a secondary index. A clustering index stores the actual spatial objects. An entry in a secondary index is just a spatial key together with a logical pointer to the object in the database.

Spatial DBMS Architectures :

Compared to a standard architecture, following extensions need to be accommodated in a spatial DBMS system: representations for the data types of a spatial algebra, procedures for the atomic operations, spatial index structures, access operations for spatial indexes, filter and refine techniques, spatial join algorithms, spatial data types and operations within data definition and query language, user interface extensions to handle graphical representation and input of SDT values etc. These extensions are incorporated in what is called extensible DBMS architecture. But before this architecture, the GIS(Geographical Information System) which is a spatial database application used two approaches called layered architecture and dual architecture.

In layered architecture, the spatial functionality is implemented on top of a given DBMS, often a commercially available relational system. In this architecture, there are two possible strategies for representing SDT values. The first is to let each tuple represent the coordinates of one point or line segment, and to break the SDT value into pieces. The second possibility is to represent SDT values in “long fields” of the DBMS.

In Dual architecture the top layer integrates two rather independent subsystems : the DBMS, which handles non-spatial data and a spatial subsystem, which stores and manipulates geometries. With this approach, the representation of each spatial object is broken into two pieces. The first part contains the non-spatial attributes and is stored in the DBMS. The second part is the spatial attribute and is kept in data structures implemented directly on top of the file system. The two pieces are connected by logical pointers.

The use of extensible system leads to an integrated architecture which takes the following view :

1. There is no difference in principle between a “standard” data type such as STRING and a spatial data type such as REGION. This includes operations for example, there is no difference in principle between forming the interconnection of two regions and concatenation of two strings.

2. There is no difference in principle between a clustering or secondary index for standard attributes and for spatial attributes.

3. A sort/merge join and a bounding-box join are the same.

4. Except for the cost function, the mechanisms for query optimization should not distinguish spatial or other operations.

Several spatial DBMS prototypes based on extensible systems have been built so far. To mention a few : the PROBE, the DASDBS GEO-Kernel and POSTGRES. PROBE offers an object oriented or functional data model, DASDBS offers a nested relational model and POSTGRES offers extended relational model. This means that extensibility of a system architecture is rather orthogonal to the data model implemented by that architecture.

Conclusion

This survey paper tried to present the major technical concepts for spatial database system including data modeling, SDTs , querying,system implementation and architecture issues. Image database systems were excluded in the discussion. Major area of application of spatial data bases include Geographical Information System (GIS), VLSI layout generation, molecular modeling etc. There is still enough scope for the researchers to design tools that will help in building efficient spatial database systems.

-------------------

Bibliography

1. The VLDB Journal, Volume 3, Number 4 October 1994, A publication of VLDB Endowment.

2. Abel, D.J. and Ooi, B.C. eds. Proceedings of the third International symposium on Large Spatial Databases, Singapore 1993.

3. Guting, R.H. and Schneider, M. Realms : A foundation for spatial data types in database systems (ROSE algebra). Proceedings of the third International symposium on Large Spatial Databases, Singapore 1993.

4. Frank A. Application of DBMS to land information systems. Proceedings of the Seventh International Conference on Very Large DataBases, 1981.

5. Stonebraker M., Rowe L.A., the design of POSTGRES. Proceedings of the SIGMOD Conference, Washington, DC, 1986.

6. Orenstein J.A. and Manola F., PROBE spatial data modeling and query processing in an image database application. IEEE Transaction of Software Engineering, 14:611-629, 1988

7. Schek K.J., Paul H.B. Scholl M.H., and Weikum G., the DASDBS project : Objectives, experiences and future prospects. IEEE Transactions on Knowledge and Data Engineering, 2:25-43, 1990.