MISRC Working Paper Series


A Semantic Object-Oriented Data Access System

Prepared by:

Salvatore T. March

Information and Decision Science
Carlson School of Management
University of Minnesota
271 19th Avenue South
Minneapolis, MN 55455
USA
smarch@csom.umn.edu

Sangkyu Rho*

College of Business Administration
Seoul National University
Seoul 151-742
KOREA
srho@plaza.snu.ac.kr


* Prof. Rho was partially supported by the Institute of Management Research, Seoul National University.

A SEMANTIC OBJECT-ORIENTED DATA ACCESS SYSTEM


ABSTRACT:

Lack of support for Entity-Relationship (E-R) semantics and the disconnect between object-oriented programming language (OOPLs) and database languages remain key roadblocks to the effective use of object-orientation in information system development. We have defined SOODAS (Semantic Object-Oriented Data Access System), a purely object oriented language that supports E-R semantics and set level querying, and provides related development tools. SOODAS is implemented by extending the OOPL Smalltalk with five meta-classes. EntityObject and Relationship provide the necessary capabilities to define entities, attributes, relationships, external identifiers, and constraints. Together with QueryNode, EntityObject provides an object-oriented, multi-entity querying capability. Queries can be arbitrarily complex and can include cycles. Persistence is provided by PermanentObject, of which EntityObject and Relationship are subclasses. EntityInterface provides a standard, re-usable interface screen definition for displaying and maintaining instances of any entity. Since SOODAS is purely object-oriented, it can be seamlessly integrated with any Smalltalk application.



A SEMANTIC OBJECT-ORIENTED ACCESS SYSTEM

1. Introduction

Object-oriented (OO) systems development is gaining widespread popularity [Guttman and Matthews, 1995; Dewitz, 1996]. Object representations are claimed to be more natural than traditional data and process models since they provide a uniform description of system capabilities. Furthermore, object-oriented development is claimed to be more efficient than traditional development methods because developers can re-use components of the design and implementation [Coad, 1995; Yourdon, 1994; Rumbaugh, et. al., 1991].

In practice, much work is necessary to transform a conceptual object model into an information system implementation. This is due, in large part, to the lack of Entity-Relationship (E-R) semantics in Object Oriented Programming Languages (OOPLs) [Ling and Teo, 1993; Bertino, et. al., 1992; Kilian, 1991] and the disconnect between OOPLs and Object Oriented Database Management Systems (OODBMSs).

Object models [Rumbaugh, et. al., 1991] express extended E-R semantics, [1] but OOPLs do not directly support them. Entities are typically implemented as classes with no shared data or operations aside from those explicitly defined by subtyping. Relationships are typically implemented using embedded objects (i.e., pointers [Yourdon, et. al., 1995; Narasimhan, et. al., 1994; Elmasri et al., 1993]). There are several disadvantages of such implementations [Rumbaugh, 1987]. First, meta-data cannot be easily queried, thus it is difficult for users to see the underlying object structure. Second, methods must be written specifically for each Entity class to support standard constraints such as entity integrity and referential integrity. Third, queries that are simple to specify in E-R or relational query languages are extremely difficult to specify in OOPLs since they navigate at the object (instance) level.

ODMG compliant OODBMSs [Cattell, 1995; O. Deux et al., 1991] resolve some of these problems but their languages use a functional rather than object oriented paradigm. This results in an impedance mismatch between the database language and the programming language in which the application is developed [Copeland and Maier, 1984]. Hence, there are significant challenges in binding OODBMS implementations to pure OOPLs such as Smalltalk [Goldberg and Robson, 1989].

To address these problems we have defined a Semantic Object-Oriented Data Access System (SOODAS). SOODAS is a purely object oriented language that supports E-R semantics and set level querying. It is implemented in five meta-classes, EntityObject, Relationship, QueryNode, PermanentObject, and EntityInterface. These support the definition and maintenance of an E-R model and its data, including entity integrity constraints and relationship cardinality constraints. Queries involving any number of entities can be easily specified and seamlessly integrated into the methods of any object in the application.

The remainder of this paper is organized as follows. The next section overviews the basic E-R semantics and discusses prior efforts to support them in object systems. The following section presents our object model, detailing the capabilities needed to support structural E-R semantics. The next section describes the object definition and manipulation methods. It also presents a brief example application. The next section discusses the SOODAS query language and provides examples illustrating its power. The final section presents limitations of our current system and directions for future research.


2. Basic Entity-Relationship Semantics and Their Support in Object Systems

Entity-Relationship semantics have been presented by numerous authors [Elmasri and Navathe, 1994; Teorey, 1994]. We overview them briefly here and describe their implementation as meta-classes in the next section.

Four basic constructs are common among to E-R semantics: entity, attribute, relationship, and identifier. An entity (or entity type) is a category or grouping of things (people, companies, events) or roles (e.g., employee, customer, assignment). All instances of an entity share a common set of attributes and relationships. An attribute is a characteristic of an entity-instance. A relationship is an association among entities. Each entity in a relationship has a minimum and maximum cardinality i.e., the minimum and maximum number of relationship instances in which one entity instance must / may participate. In this research we consider only binary relationships without attributes. Higher order relationships and relationships with attributes must be represented by entities. Each entity has at least one set of attributes and / or relationships, termed its (external) identifier, whose values uniquely distinguish its instances. Entity integrity demands that, given the value of an identifier, there is at most one corresponding instance of that entity.

Supertype/subtype or ISA relationships were introduced to increase the fidelity of data models to the real world [Smith and Smith, 1977]. These allow the specification of subsets of an entity's instances (subtypes), having specialized attributes and / or relationships in addition to those of the entity (supertype). This concept, central to the OO paradigm, includes the notion of inheritance. A subtype inherits all attributes and relationships (and behaviors) from its supertype(s).

A system that supports E-R semantics must have a mechanism to define and maintain entities, their attributes, relationships, identifiers, and constraints. It should support access to the definitions as well as to the data. An entity must be able to answer the names of its attributes and relationships as well as selected subsets of its instances. An entity instance must be able to answer values for each of its attributes and instances for each of its relationships. Furthermore, to be effective, set level operations equivalent to relational operations must be supported [Markowitz and Shoshani, 1993; Norrie, 1994].

Seven basic concepts are common to the object paradigm [Rumbaugh, et. al., 1991]: object, behavior, method, message, class, inheritance, and polymorphism. An object is an instance. Behavior is what an object can do, i.e., its capabilities. These are implemented in the methods of the object. An object performs its methods (exhibit its behavior) when sent a message. The signature of a message is defined in its associated method. Objects with the same methods (i.e., behaviors) are organized into classes. A class represents a template for a set of objects. Frequently sets of objects share some, but not all methods. Inheritance is the mechanism whereby classes share methods. Methods can be defined in a class (termed a superclass) and inherited by other classes (termed subclasses). Polymorphism refers to the use of the same method name in different classes enabling objects in different classes to respond to the same message in different ways.

Support for entities and attributes is provided through the class concept. An entity can be represented as a class with its attributes as instance variables. No direct support is provided for identifiers in the E-R sense. Each instance is given a unique object identifier or OID; however, there is no way to specify that a set of variables must contain a unique value within the set of instances of the class. Aside from the class hierarchy (exclusive subtypes that do not necessarily cover their supertype), relationships are supported only by embedded objects [Rumbaugh, 1987]. Thus, instance-at-a-time navigation of related objects is supported; however, set level queries are not.

All constraint checking must be done by each class. Constraint specification and enforcement must be implemented in the methods of each class representing an entity. Each must have a method to check entity integrity. Each must have methods to check relationship constraints for each relationship in which it participates. Binary relationships can be represented in one or both classes (i.e., one or both directions); hence, the designer must decide where the relationship is to be represented and develop methods in the appropriate classes to enforce the appropriate constraints.

There have been several prior attempts to directly support E-R semantics using object concepts. Rumbaugh [1987] and Shah, et. al. [1989] implemented an object-relationship modeling language in C that supports relationships and enforces relationship constraints. However, their approach is functional rather than purely object oriented, they do not include meta-data management and they do not include a query language.

Diaz and Paton [1994] propose using meta-classes to extend the OODBMS Adam. They introduce the meta-classes Entity_Concept and Relationship_Concept. However, they do not address entity or relationship integrity constraints at the meta class level, nor do they propose an object-oriented query language.

The Object Data Management Group (ODMG) has addressed a number of these issues by proposing a standard (ODMG-93) for Object Database Management Systems (ODBMSs) [Cattell, 1995]. However, ODMG-93 does not address issues of metadata management, conceptualizes relationships as embedded objects, and fails to fully represent relationship constraints. Its query language (OQL) is functional, rather than object oriented, adhering primarily to the SQL 92 standard [SQL 92]. Hence, ODMG-93 compliant OODBMSs support some E-R semantics but there are significant challenges in binding OODBMS implementations to pure object concepts and languages (such as Smalltalk). In a pure object environment, there is a single conceptual representation: objects interacting through messages. Functional query languages do not fit the message passing paradigm characteristic of pure object environments. This presents conceptual difficulties to developers who must switch between object and functional paradigms during the development process.

GemStone [Butterworth et al., 1991] provides a seamless integration between its database language OPAL and OOPL Smalltalk. However, its object model lacks E-R semantics and it does not support set-level multi-entity querying.

Recognizing that the basic operations to support E-R semantics are the same across all entities and relationships, parameterized for identifier and relationship constraints, we create classes to perform these operations and allow entities to inherit E-R operations from or delegate E-R operations to them. Not only does this reduce the development effort by re-using methods, it also provides for the management of meta-data which can also be queried by users as necessary [Elmasri, et. al., 1993; March and Rho, 1996].


3. A Class Structure for E-R Semantics

Support for E-R semantics is provided via five meta-classes: EntityObject, Relationship, QueryNode, PermanentObject, and EntityInterface. These have been implemented as extensions to ParcPlace VisualWorks Smalltalk [VisualWorks, 1994; Lewis, 1995]. This implementation can easily be ported to object-oriented database languages based on Smalltalk such as GemStone's OPAL [Butterworth et al., 1991]. Their definitions, including class and instance variables and methods, and position in the Smalltalk class hierarchy are shown in Figure 1. Object, Model and ApplicationModel are system supplied classes. Object is the root of the class hierarchy, supplying such capabilities as instance creation and OIDs. Model and ApplicationModel provide the engine behind the Model-View-Controller (MVC) architecture used to manage user interfaces.

EntityObject and Relationship provide structural E-R semantics. When sent the appropriate object definition language (ODL) message, EntityObject defines an entity as a subclass of itself and generates instance variables and methods for all attributes. Each entity inherits the variables (e.g., Identifier, Partition) and capabilities from EntityObject needed to manage its own data and meta-data, including the ability to define and enforce entity integrity constraints (external identifiers), subtypes and subtype constraints (exclusive or partition).

Similarly, when sent an appropriate ODL message, Relationship defines a relationship as an instance of itself. Relationship manages all relationship data and enforces all relationship constraints. Each relationship is described by its name (name), the related entities (e1 and e2), their cardinality definitions (min1, min2, max1 and max2), access names (access1 and access2) and the related pairs of instances (instances1 and instances2). Thus relationships are not represented by embedded objects as often done in Object implementations, rather each Relationship instance holds the related pairs of objects for the relationship it represents.

Each Relationship instance inherits the capability to add and remove entity instance pairs (provided no integrity constraints are violated by doing so) and to answer the related instances for either entity instance in the relationship. For convenience, methods are generated in each entity of a relationship to request these services (see Section 4.1). A relationship can be recursive. For recursive relationships additional methods are generated to access the transitive closure.

Queries are represented as graphs of interconnected QueryNode instances. They are formed by sending object manipulation language (OML) messages to entities and to the QueryNode instances they create. Each entity inherits methods from EntityObject to create a QueryNode instance containing the projection (project), selection (select), and ordering criteria (orderBy) for that entity in a query. Each QueryNode instance inherits methods to appropriately connect itself with another QueryNode instance which specifies another entity in the query and to materialize a query result (Section 5 describes in detail the way in which query graphs are constructed). Thus the query language provides a natural traversal of the E-R model at the entity level.

EntityObject and Relationship inherit persistence from PermanentObject. In an Object-Oriented OODBMS such as GemStone, persistence would be provided by the OODBMS itself and PermanentObject could be eliminated. In the current implementation PermanentObject holds all data in instances of the Smalltalk class Dictionary. Dictionaries are maintained in virtual memory and can grow to be many megabytes. The memory manager is responsible for paging virtual memory to and from secondary storage. While there are some efficiency concerns, this implementation demonstrates the feasibility of our approach. Its scalability is currently under investigation.

EntityInterface provides a standard, re-usable interface for displaying and maintaining instances. EntityInterface is a subclass of ApplicationModel [VisualWorks, 1994], from which it inherits methods to open windows and interact with the user. EntityInterface can create subclasses of itself to define entity maintenance screens for any entity. Model and ApplicationModel provide the engine behind the Model-View-Controller (MVC) architecture used to manage user interfaces. The next section describes the object definition and manipulation methods of SOODAS and shows how an E-R model is defined and maintained. The following section describes the SOODAS query language.


4. SOODAS Object Definition and Manipulation

4.1 Object Definition Language (ODL)

Object definition methods enable the creation of entities, subtypes, attributes, relationships, external identifiers, and constraints. These are defined as class methods of EntityObject and Relationship. Following are the signatures of the corresponding object definition messages. An example illustrating their use is presented below.

CreateEntity: eName attributes: attrList under: appName (sent to EntityObject) creates an entity named eName as a subclass of EntityObject, having the instance variables named in attrList under the category appName. Categories provide a means of organizing entities into applications. Accessor and assignment methods are automatically generated for each instance variable. CreateSubtype: subtypeName attributes: attrList (sent to an entity) creates a subtype (subclass) of the receiver [2] entity named subtypeName, having the instance variables named in attrList. The subtype inherits all variables and methods from its superclass and is placed in the same category as its superclass. Partition: boolean (sent to an entity) specifies if the subclasses of the receiver form a partition (boolean is true) or not (boolean is false). An entity with partitioned subclasses is an abstract class. That is, it has no instances. All instances are in one of its subclass.

new: e1 and: e2 withMin: min1 andMin: min2 withMax: max1 andMax: max2 named: rName accessBy: access1 inverselyBy: access2 (sent to Relationship) creates a new relationship, named rName, between entities e1 and e2 (e1 and e2 can be the same entity) with the specified minimum and maximum cardinalities. For convenience it also creates instance methods in each entity, named for access1 in e1 and access2 in e2, which enable them to easily request access and maintenance services from Relationship. For recursive relationships these include methods that support transitive closure queries.

Identifier: idList (sent to an entity) defines a primary (external) identifier for the entity as the set of attribute and/or relationship methods in idList. This identifier is used to enforce entity integrity. It also generates an instance method of the entity called id which answers the identifier of the instance to which it is sent.

To illustrate these language constructs, we define the E-R model illustrated in Figure 2. What follows is an executable transcript in Smalltalk. Comments are enclosed in double quotes. Many has been declared as a Global variable. An interactive screen (Figure 3) has been developed to send these same messages.

" Define the entities. " EntityObject CreateEntity: #Employee attributes: 'eno ename ssn exemptions ytdGross ytdFit ytdFICA ytdSit' under: 'Payroll'. EntityObject CreateEntity: #Department attributes: 'dno dname dbudget' under: 'Payroll'. EntityObject CreateEntity: #PayCheck attributes: 'checkNo checkDate gross fit sit fica' under: 'Payroll'. EntityObject CreateEntity: #TaxState attributes: 'stateAbbr taxRate' under: 'Payroll'.

" Define subtypes which Partition Employee i.e., all Employees must be one of the subtypes. " Employee CreateSubtype: #HourlyEmployee attributes: 'hourlyRate currentHours'. Employee CreateSubtype: #SalariedEmployee attributes: 'salary'. Employee CreateSubtype: #SalesPerson attributes: 'commissionRate'. Employee Partition: true.

" Define the relationships. " Relationship new: Employee and: Department withMin: 0 andMin: 1 withMax: Many andMax: 1 named: 'Report' accessBy: 'ReportsTo' inverselyBy: 'Employs'. Relationship new: Employee and: Department withMin: 0 andMin: 0 withMax: 1 andMax: 1 named: 'Manage' accessBy: 'Manages' inverselyBy: 'ManagedBy'. Relationship new: Employee and: PayCheck withMin: 1 andMin: 0 withMax: 1 andMax: Many named: 'Payroll' accessBy: 'PaidBy' inverselyBy: 'Pays'. Relationship new: Employee and: TaxState withMin: 0 andMin: 0 withMax: Many andMax: 1 named: 'PayrollTax' accessBy: 'LivesIn' inverselyBy: 'HomeTo'. Relationship new: Department and: Department withMin: 0 andMin: 0 withMax: Many andMax: 1 named: 'OrgStructure' accessBy: 'ParentUnit' inverselyBy: 'ChildUnit'.

" Define the external identifiers. " Employee Identifier: 'eno'. Department Identifier: 'dno'. PayCheck Identifier: 'checkNo'. TaxState Identifier: 'stateAbbr'.

As a result of sending these messages, a database schema is defined. A class is created for each entity and subtype with appropriate instance variables. For example, TaxState is a class having two instance variables, stateAbbr and taxRate with appropriately named accessor and assignment methods (the assignment methods have a colon specifying that they take a parameter). It is a subclass of EntityObject, thus inheriting object access and maintenance methods from it. It has an instance method named id that answers the value of stateAbbr, its identifier.

A Relationship instance is created for each relationship. Each entity has methods for each relationship in which it participates. For example, five relationship methods are generated in TaxState for its relationship, PayrollTax, with Employee. They are named according to its access name, HomeTo. These request appropriate services from Relationship (see Section 4.2 below) to access or assign appropriate Employee instance(s). HomeTo answers a related Employee instance. HomeTo: anEmployee establishes a relationship between the receiver TaxState instance and anEmployee (an instance of Employee). HomeToID answers the identifier of a related Employee instance (i.e., its eno). HomeToID: anEno establishes a relationship between the receiver TaxState instance and the Employee instance identified by anEno. HomeToC answers the collection of Employee instances related to the receiver TaxState instance.

For example, the generated HomeToC is: HomeToC ^Relationship RelatedTo: self using: 'PayrollTax'. It answers the result of sending the message RelatedTo:using: to the class Relationship with parameters self (i.e., the TaxState instance receiving the message) and the relationship name PayrollTax (alternately the access names 'HomeTo' or 'LivesIn' could be used as the second parameter).

OrgStructure is a recursive relationship of Department. Five relationship methods are generated for each access name (ParentUnit and ChildUnit). Two additional methods, AllParentUnitC and AllChildUnitC are also generated to access all ancestor and all descendent departments, respectively. These methods can be used for transitive closure queries.


4.2 Object Manipulation Language (Meta-OML and OML)

In SOODAS data and meta-data are equally accessible. Entity meta-data is accessed using the meta-OML methods inherited from EntityObject (Figure 1). Attributes answers a string containing the names of all of the entity's attributes. Relationships answers the set of relationships of the entity. Identifier answers the external identifier of the entity. subclasses answers its direct subtypes. superclass answers the supertype of the entity.

Relationship meta-data is accessed using the meta-OML methods of Relationship (Figure 1). entities answers a collection containing the two entities participating in the relationship. max: e using: accessName (min: e using: accessName) answers the maximum (minimum) number of instances of the other entity in the relationship that can (must) be related to one instance of entity e using access accessName. Use of these methods is illustrated in Figure 4.

PermanentObject (Figure 1) provides four basic OML methods: Instances, FindInstance:, AddInstance: and DeleteInstance:. These are inherited by EntityObject and Relationship. EntityObject redefines AddInstance: and DeleteInstance: and provides four transaction management methods: UndoAdd:, StartUpdate:, UpdateInstance:, and UndoUpdate:.

Instances answers the collection of instances of the receiver [3] including its descendant subtypes, if any. Since this method answers a collection, Smalltalk collection methods such as do:, select:, collect: and detect: are available to process the results. FindInstance: anID answers the instance of the receiver having the identifier value anID, nil if none. For example, the following Smalltalk code produces a collection of employees who report to department '001'.

AddInstance: anInstance permanently adds anInstance to the receiver. When sent to an entity it also checks entity integrity and relationship cardinality constraints. If any are violated, a warning message is displayed and the operation is canceled using UndoAdd: anInstance. DeleteInstance: anInstance permanently deletes anInstance, provided it exists. When sent to an entity it also checks entity integrity and relationship cardinality constraints. If any would be violated, a warning message is displayed and the operation is canceled.

StartUpdate: anInstance locks anInstance and initializes a log for the cancellation of the update. UpdateInstance: anInstance permanently updates anInstance, provided entity integrity and relationship cardinality constraints are not violated, otherwise a warning message is displayed and the operation is canceled using UndoUpdate: anInstance. These messages are used in instance methods of the class EntityInterface, which, as discussed below (Section 4.3), generates user interface classes as subclasses of itself. Thus, for simple data maintenance activities, the user does not need to even be aware of them.

Compared to the instance creation methods of OQL [Cattell, 1995], SOODAS is object oriented, while OQL is functionally oriented. A new Department instance can be created and made persistent in SOODAS as follows:

That is, a new instance of Department is created by sending the message new (inherited from the class Object) to the Department class. The messages dno:, dname:, dbudget:, ParentUnitID:, and ManagedByID: are sent to this instance with appropriate parameters (these methods were generated when the Department class was created). It is made persistent by sending the message AddInstance: to the Department class. The instance is a Smalltalk object and can be sent any message valid for Department instances. Alternately, this can be accomplished using a generic new: method: Department new: #(dno: '005' dname: 'Marketing' dbudget: 100000 ParentUnitID: '001' ManagedByID: '1020')

OQL takes a functional approach, using type constructors as follows:

The Department instance must then be bound to a Smalltalk object and accessed via a query. We do not argue that the object oriented approach is better than the functional approach. We simply point out that OQL has taken a functional approach similar to the hybrid language C++ while SOODAS has taken an object approach based on the OOPL Smalltalk. This difference also appears in the query language as discussed below.

Relationship has three OML methods for maintaining relationship-instances (in the E-R sense): Connect:and:using:, Disconnect:and:using: and ReConnect:from:to:using:. It has three OML methods for accessing relationship-instances: RelatedTo:using:, FirstFor:using: and AllRelatedTo:using:. In all cases the using: parameter is the relationship name or access name; all other parameters are entity-instances. Users can maintain related instances by using the generated relationship methods (e.g., HomeTo: and HomeToID:). They will typically not use the relationship maintenance methods directly. However, since the generated relationship methods use them, they are briefly described below.

Connect: i1 and: i2 using: rName establishes relationship rName between entity instances i1 and i2, provided that each is of the appropriate entity and that relationship cardinality constraints are not violated. In case of recursive relationships, rName must be an access name (i.e., access1 or access2). DisConnect: i1 and: i2 using: rName removes the relationship rName between entity instances i1 and i2, provided it exists and that relationship cardinality constraints are not violated by its removal. ReConnect: i1 from: i2 to: i3 using: rName removes the relationship rName between entity instances i1 and i2, if it exists, and creates one between entity instances i1 and i3, insuring that each instance is of the appropriate entity and that relationship cardinality constraints are not violated at the end of the update. ReConnect:from:to:using: is needed because minimum cardinality constraints may be violated during the update process.

Similarly, users can obtain related instances by using the generated relationship methods (e.g., HomeTo, HomeToID, and HomeToC). They will typically not use the relationship accessing methods directly. Again, however, since the generated relationship methods use them, they are briefly described below.

RelatedTo: i1 using: rName answers a collection containing the instances related to instance i1 in the relationship rName. FirstFor: i1 using: rName answers the first instance related to i1 in the relationship rName ("first" is arbitrarily defined since pairs of related instances are not sequenced). AllRelatedTo: i1 using: aName answers the transitive closure of instances related to i1 in the relationship represented by the access name aName. If the relationship represented by aName is not recursive, then AllRelatedTo:using: and RelatedTo:using: answer identical collections.


4.3 Creating User Interfaces.

The SOODAS class EntityInterface provides the basic functionality needed to maintain instances. It has one method: CreateInterfaceFor: anEntity. When sent to EntityInterface this method generates a display screen containing a scrollable list of instances of anEntity in a default format and buttons to add, delete, edit, and find instances. It also generates a default data entry screen which is displayed when the add or edit buttons are selected. The data entry screen contains an input field for each attribute and a combo box for each one-to-many relationship for which anEntity is on the many side (i.e., the screen supports the maintenance of one-to-many relationships in what appears to the user as the familiar "foreign key" format).

Figure 5 illustrates the screens generated when the following message is sent.

A column in the display screen and a combo box in the data entry screen are generated for the relationships, Manage and OrgStructure. The default labels are ManagedBy and ParentUnit, the access names. User supplied labels could easily be added to the interface generation process. Additional interface classes have been developed to support common update patterns such as parent-child relationships and many to many relationships.



5. SOODAS Query Language

In addition to the processing capabilities of Smalltalk SOODAS supports an object-oriented query language via its meta-classes EntityObject and QueryNode. As discussed above, data may be processed by, for example, sending the message Instances to an entity and using the standard Smalltalk collection methods (e.g., do:, select:, collect:, detect:) to manipulate the resultant collection. As opposed to the OQL which is a "functional language" [Cattell, 1995, pg. 53], SOODAS is strictly object oriented (see Appendix 1 for an informal BNF). OQL extends the SQL2 syntax with object concepts, but, strictly speaking, is not itself object oriented in that it does not conform to the message passing paradigm. It specifies queries functionally using functional concepts.

EntityObject has four class methods to support projection, selection, and ordering. SOODAS uses the keyword Project for projection criteria, select for selection criteria, and orderBy for ordering criteria. As commonly done in OO implementations, the first three methods add "syntactic sugar" to the language. They each use the last method, with an empty code block and/ or an empty string for the unused capabilities.

When sent to an entity these messages answer an instance of QueryNode (see below) defining the entity's part in the query. Every query has a set of values (objects) to include (project) in the output. These are specified as instance methods of the receiver of the message, separated by blank spaces in the string, aProjectString. Any instance method of the receiver class may be used in aProjectString. This includes the method self which answers the instance itself as well as methods that answer multiple values, perform aggregate functions, or even answer other instances.

For example, assuming that avgYTDGross is an instance method of Department which calculates the average year-to-date gross pay of employees in the department, the following query produces a report of departments with their budgets, average year-to-date gross pay, and a list of their employees.

The message Project: with the string 'dno dname dbudget avgYTDGross EmploysC' as its parameter, is sent to the class Department which answers a QueryNode instance that, when materialized, answers the values answered by sending those messages to each of its instances. In OQL [Cattell, 1995] this same query would be stated as:

From the user's perspective, the messages dno, dname, dbudget, avgYTDGross, and Employs are sent to x, a representative of Department, however, there is no object to which the query itself is sent. In this sense OQL is functional and not object oriented. SOODAS is strictly object oriented in that the only means to produce a result is to send a message to an object.

Queries may optionally have selection criteria and sorting criteria. Selection criteria are specified as a code block (the select: parameter). A code block is an object containing an executable set of Smalltalk statements (objects and messages sent to objects) enclosed in brackets. A selection code block must evaluate to true or false. Sorting criteria are specified as a string (the orderBy: parameter) containing the message sequence defining the ascending or descending collating sequence.

A code block to specify a selection criterion has the following form:

example is an iterator variable or an example not unlike the use of examples in Query By Example. It is defined within the scope of the code block. It references each instance of the entity to which the query message is sent. Hence, the sequence of messages in messageSequence is sent to each instance of the entity. Since the answer to a message is an object, messageSequence can use relationship methods to reference instances in any entity to which the receiver class is related by any sequence of relationships. If the answer from a message is a collection (e.g., EmploysC), existential methods such as all: and any: can be used to process it. Comparison operators such as =, >, > =, <, < =, ~= can be included in messageSequence as they are simply messages in Samlltalk. For example, when sent as the select: parameter of a Project:select: message sent to Employee, the following code block selects employees who have an employee number exceeding '5000':

As mentioned above, selection conditions can be specified on related entities. This is accomplished using a message sequence that includes relationship methods. For example, the following code block selects employees who live in states whose tax rate exceeds 5%:

When sent the message LivesIn, e (an employee) answers its related TaxState instance. When sent the message taxRate, this TaxState instance answers its taxRate.

As mentioned above, SOODAS also includes methods for set selection including any and all. For example, the following (nested) code block, when sent as the select: parameter of a Project:select: message sent to Department selects departments where all employees in the department have ytdGross in excess of $100,000:

d iterates over all departments. When sent the message EmploysC, d (the current department) answers the collection of its employees. This collection is sent the message all: with the code block [:e | e ytdGross > 100000] as its parameter. For each collection of employees to which it is sent, e iterates over those employees answering true if all employees in the collection answer a value greater than 100000 when sent the message ytdGross.

Furthermore, iterator variables can be referenced inside nested code blocks to select instances based on values in related instances. For example, the following code block selects departments where any employee who reports to the department has a ytdGross that exceeds that department's budget:

A string that specifies a sorting criteria contains a message sequence defining the collating sequence optionally followed by the keyword ASC for ascending or DESC for descending. If neither keyword appears, ASC is assumed. For example, assuming the receiver is Employee, the orderByString 'eno ASC' sorts employees by employee number (increasing). 'eno DESC' sorts in decreasing employee number order. Objects of any type can be used for sorting provided they have appropriately defined comparison methods. For example, assuming the receiver is Department, the orderByString 'EmploysC size DESC' sorts departments in decreasing order of the number of employees reporting to them.

The ability to join entities along relationships is a key capability of any query language. SOODAS supports joins using QueryNode and its instance method, Join: aQueryNode using: aRelationshipName. When sent to a QueryNode instance, this method links the QueryNode instance aQueryNode to the query definition graph of the receiver provided it contains an entity participating in the relationship, aRelationshipName. It answers the receiver to which another Join:using: message can be sent. Hence, queries involving multiple joins can be easily and naturally stated.

To illustrate the way in which a query graph is constructed, consider the query illustrated in Figure 6. This query lists tax states (showing the state abbreviation and tax rate) of employees (showing the number, name, social security number and year to date gross pay) whose year to date gross pay exceeds $100,000, including the department to which the employee reports (name and budget), and the department manager (name and social security number), organized by tax state, alphabetically by employee name within decreasing order of state tax rate. It involves three entities, TaxState, Employee, and Department. Employee plays two different roles (worker and manager), via two different relationships (Report and Manage).

The query graph for this query is constructed as follows (see Figure 6). When sent the message Project: 'stateAbbr taxRate' orderBy: 'taxRate DESC', TaxState creates an instance of QueryNode (say qn1) containing the definition of this part of the query including the class to which is to sent (e2 is set to TaxState), the projectString (project is set to 'stateAbbr taxRate') and the orderByString (orderBy is set to 'taxRate DESC'). This part of the query has no select: clause (where is set to [], a null code block). Employee similarly creates a QueryNode instance (say qn2) when sent the message Project: 'eno ename ssn ytdGross' select: [:e | e ytdGross > 100000] orderBy: 'ename'). When sent the message Join: qn2 using: 'PayrollTax', qn1 connects itself and qn2 using a two way linked list (next in j1 is set to j2 and prior in j2 is set to j1), stores the relationship (rel is set to PayrollTax in j2) and answers itself . [4] Department and Employee similarly create additional QueryNode instances (say qn3 and qn4) that store the parameters of the subsequent Join:using: messages and are appropriately interconnected to represent the query.

A query graph specifies the logical definition of a query. How it is materialized depends on the query optimization strategy. The current SOODAS query processor does not optimize execution, it simply materializes the query in hierarchic order as specified. Query optimization in an object environment is an area of further research.

A QueryNode instance materializes the query result as an ordered collection of objects when sent the message asOrderedCollection. This result can then be sent any Smalltalk message valid for an ordered collection, including printString, which will convert it into printable form.

For example, the query:

is executed as follows. The message Project: is sent to the Employee class with the parameter 'eno ename'. Employee creates and answers an appropriately defined QueryNode instance. The message asOrderedCollection is sent to this QueryNode instance which answers a collection containing the employee number and name of each employee, including all subclass instances. The message printString is sent to this collection which converts it into printable form. This object could be stored in a variable, sent to the system printer or saved in a file.

Having the ability to use referents and methods in projection and selection criteria, SOODAS provides an extremely powerful, flexible, and easy to use query language. Furthermore, by directly implementing E-R semantics and providing entity-based data maintenance capabilities, SOODAS provides an extremely powerful system development environment. Appendix 2 presents a number of queries that illustrate the power of its query language.


6. Summary and Directions for Future Research

SOODAS demonstrates the feasibility of directly supporting Entity-Relationship semantics using a purely Object-Oriented paradigm. Three meta-classes, EntityObject, Relationship, and QueryNode provide the structure and operations necessary to define, maintain, and access data using E-R semantics. These are supported by the class PermanentObject which provides persistence and data access capabilities and the class EntityInterface which provides support for developing maintenance screens.

SOODAS includes purely object-oriented object definition (ODL) and manipulation (OML) languages that directly support Entity-Relationship semantics. These are implemented as methods of our meta-classes. Hence, their use can be seamlessly integrated into any Smalltalk application. This overcomes difficulties that may be caused by the impedance mismatch between object programming languages and functional database languages.

The SOODAS query language is extremely powerful. It supports set-oriented queries in a straightforward and natural way. It is extensible in the sense that methods needed for complex projection or selection criteria can be added to the appropriate entities as needed. SOODAS queries can be seamlessly integrated into Smalltalk applications.

The current version of SOODAS can be used as a rapid prototyping tool since it enables application developers to directly implement a conceptual object model into a working system with minimal effort. Through EntityInterface it provides the ability to quickly generate object maintenance screens.

SOODAS is, however, limited in several respects, each of which is the subject of further research efforts. Currently SOODAS relies on the VisualWorks [1994] memory manager for moving data (and methods) to and from permanent storage. Given the trend toward integrating operating system and data management functions into objects that provide system capabilities this architecture appears to have promise. A significant amount of research is needed to address the efficiency issues associated with such an architecture. In particular, caching, data clustering, indexing, and paging algorithms will need to be investigated. Alternately, SOODAS could be ported to an OODBMS such as GemStone which directly supports Smalltalk.

Related to the memory management and data organization issues is query optimization. Query optimization in a paged, virtual memory storage environment has received very little attention. This problem will be even more significant in a distributed object environment where objects may "live" in different geographic locations yet must still participate in query and update operations [Guttman and Matthews, 1995].

Results of SOODAS queries are currently produced as a collection of (possible nested collections of) objects (values and instances). Future research will address formatting issues by creating Report as a subclass of PermanentObject with instance variables such as reportName to identify the report, reportSpecifications containing its formatting specifications and query containing a QueryNode instance by which to access its query. It will have instance methods such as produceReportOn: where the parameter is the output object such as a printer or file.

Of smaller scope, but still areas requiring further research include a number of SOODAS design issues. First, if any constraint is violated by an update operation, the operation is canceled. Future research will be directed at the specification of alternate constraint violation actions. Actions such as cascade delete must be supported, however, additional, user specified, actions must also be supported. For example, if the user specifies the identifier of a non-existent entity for a relationship, the designer may wish to allow the user to add a new instance of the related entity. Similarly, future research will address the specification and enforcement of mutual dependency constraints (i.e., relationships where both minimum cardinalities are greater than 0).

Second, since the Smalltalk class hierarchy is directly used to implement entities, only generalization hierarchies with mutually exclusive subtypes are supported. SOODAS provides a mechanism to support partition but not cover constraints. Future research will address the specification of multiple overlapping subtypes with a full set of subtype constraints [Gottlob et al., 1996]. Beyond this, future research will address the specification of a generalization lattice with multiple inheritance and supertype constraints.

Finally, the current implementation does not support aggregation relationships. An aggregation is a collection of (aggregation) entities that constitute the "parts" of another (aggregate) entity. Such structures must include constraints specifying that instances of each of the aggregation entities must exist for each instance of the aggregate entity.

From a practical perspective, SOODAS must be evaluated empirically. One of the claimed advantages of Object Orientation is development efficiency and code re-usability. A key concern, and one for empirical research, is the impact of a tool like SOODAS on system development effort, query formulation, code re-use, and maintainability. Preliminary evidence is currently being gathered by using SOODAS as the implementation environment for teaching OO development.


Appendix 1

Appendix 2

References

Footnotes

Go to Working Paper Order and Request Form...


Return to the MISRC Working Papers Page


Return to the MISRC Home Page

This page last updated on February 21, 1997


This page designed and maintained by Nikki Michalowske. misrc@csom.umm.edu