There are two key approaches to managing data - database and file - both involve structuring data and facilitating access, however there are some fundamental differences and, in a number of ways, the database approach can be seen as solving some of the problems associated with the file system. The first significant difference is that the file system involves differentiated data and the database integrated data. This simply means that where a different filing system exists for each different department of an organisation the database system collects and centralises all the data of a given organisation. This may seem of little import to a small organisation like UNDRFM however it has an important consequence, namely that data is more secure when stored within a database because access to it can be controlled by a central administrator. The second major difference, which stems from the first, is that the database involves independent data and the file system dependent data. In its simplest formulation this means that where the code or structuring logic for locating data in a database is not known by the user or application searching for the data it is absolutely essential that it is known when searching a file system, or in other words, database data contains meta-data pertaining to the its location within the database and filed data does not. This gives the database greater flexibility which is useful to this project for two obvious reasons: it enables us to provide access to a large number of users without having to share the code or structuring logic of the files system and it allows us to modify and change the database structure without having to alter the coding of programs accessing it.
The database approach with its improved security, flexibility and program independence is the most suitable data management system to use. The most common type of database is the relational database the basic elements of which are entities and relations. An entity is a thing that is represented and described by the data and a relation is a relationship that exists between two or more entities. For us the two entities involved are: a) members and b) web-sites. The relation that exists between them is: members work is published on a website. The design of a database begins by mapping the fluid structure that will house the data, that is, by sketching the entities and their possible relations in an ER scheme. The ER scheme provides the blueprint from which databases are built.
SQL (Standard Query Language) is the language used to communicate with databases. The UNDRFM members database will be created, populated and searched using SQL. For example:
Create table Members
(Name primary key char [50],
E mail address char [50],
Home city char [25],
Published on char [10];
In order to fully populate the “Website” data table it is necessary to input the URLs of phonography websites, this information will have to retrieved. There are three perspectives from which to analyse information retrieval. Firstly there is the user who possess an information need. In this case UNDRFM need the URLs of phonography web sites. A query that can be defined in Broder's taxonomy as a 'navigational query' as its intention is the home page of an organisation e.g. the London Sound Survey. Secondly there is the system that is used to satisfy this information need. This term refers to the software and hardware that is used to store, locate and process the required information. In this instance the system view broadly covers the internet, WWW and HTML. The Internet: is the vast network of networks connecting innumerable computers via electronic, wireless, and fibre-optic connections into one massive super-network. The internet is the infrastructure and the WWW is one service that uses it - it is a system of web pages connected by hypertext links. One either follows a link or types in a Universal Resource Locator (URL) address. The computer, via the web browser and web server, finds and gets the corresponding page from its networked host. And finally the last important component of our systems view is Hyper Text Mark Up Language (html) which Describes the content of web pages. It's not a programming language it is a mark up language that consists of mark up tags that usually come in pairs: a start tag and an end tag that are in <> brackets. The third and final perspective from which to view information retrieval is that of the source. This view covers the providers of the information and in our example includes the Sonic Arts Network.
We will be performing a key word search using the index terms field recording. A Google exact match search on the terms “field recording” provided an inverted file of around 2,330,000 web pages in 0.17 seconds. From this we will use a browsing technique to locate the URL of field recording publishing sites. This is involves using hyper text - the text that appears on a web page that contains links (hyperlinks) to other documents on the web that are accessible by clicking the hypertext - to navigate from one field recording website to another. If the keyword search failed to yield sufficient results we could modify the search query in two ways 1) by searching the synonym: “phonography” or 2) by trimming the words to their root e.g. “sound record” and “phonograph”. There are two ways of evaluating our search - qualitatively or quantitatively. For this project we will conduct a qualitative analysis of the search's efficacy from the perspective of the client. This will take the form of a questionnaire sent out to UNDRFM members that simply asks if the results of the search were useful to its members.
When searching unstructured information using the retrieval methods discussed above the results are probabilistic – the web pages displayed 'probably' meet the needs of the user. The searches that users of the UNDRFM database perform are by contrast deterministic, that is, they definitely meet the needs of the user. They are very precise searches for information you know is there because it is your information and it is structured. IR uses natural language and Database searches use SQL:
The command: Select name from members; will produce a result that details all the members names.
Select organisation from websites where web id = 1; will display the website with the id 1 – London Sound Survey.
Select name, url
From members, websites
where published on = 1
and published on = web id;
Will display all the names of people who've work published on London Sound Survey's web site as well as the site's URL.
The grammar is very strict. Unlike natural language where if you err you may still be understood, the smallest mistake in the syntax of an SQL instruction means the computer will fail to read and execute the command. This is a problem with Databases in particular and web 1.0 technology in general – they require the developer to become a polyglot.
Bibliography
Macfarlane, A., Butterworth, R., Dykes, J., (2009) The Internet and the World Wide Web, London: City University
Macfarlane, A., Butterworth, R., Krause, A., (2009) Structuring and Querying Information Stored in Databases, London: City University
Macfarlane, A. (2009) Information Retrieval, London: City University
Musciano, C., & Kennedey, B., (2002) HTML & XHTML The Definitive Guide 5th Edition, Sebastopol CA: O'Reilly & Assoc.
Robertson, S.E., & Sparck Jones, K., (1997) Simple, Proven Approaches To Text Retrieval, London: City University.
Taylor, A G., (2010) SQL For Dummies 7th Edition, Hoboken NJ: Wiley Publishing INC.
SQL Course [online] Available http://www.sqlcourse.com/index.html
Search SQL Server [online] Available http://searchsqlserver.techtarget.com/
