This section of the ReStore site provides general guidance for web resource authors, particularly those working on ESRC-funded research methods resources. These materials are being further developed and will be offered both in interactive format and as a downloadable handbook. Please bookmark the page and come back for further updates!
Introduction to sustainable web resources guidelines  |
| Authors increasingly cite web pages and other digital objects on the Internet, which can "disappear" overnight. In one study published in the journal Science, 13% of Internet references in scholarly articles were inactive after only 27 months [1].
|
|
The online information revolution has provided numerous opportunities to express, share and communicate more quickly and easily. The web has evolved into an enormously rich but largely unstructured source of data and information. The phenomenal growth in the web reflects widening access and developing web technologies, but also the increased availability of really useful content. Creating, publishing and accessing content have become commonplace, especially due to social networking, online communities and a series of technologies which have become known as Web 2.0.[2] In such an environment, where most web content is uncontrolled, it has rapidly become very difficult to assess and track the highest quality web content.
|
|
During the last decade, numerous technical standards have emerged for the mark-up and scripting of information to be presented on the web. These include HTML, XHTML, CSS, ASP, JSP, Pearl, and PHP. The software platforms developed around such languages and technologies are continually evolving to create a complex hybrid web landscape. In such a situation it is increasingly likely that today's content will disappear or simply be incompatible with tomorrow's web technologies. Every user is familiar with the frustration of attempting to follow a broken link or opening web content which can no longer be read, played or viewed. Clearly it is possible to continuously maintain websites, but not all content is worth such effort. An important question which arises is therefore how to discern the quality of web content when there is no straightforward litmus test. In particular, how should we determine what needs to be preserved for future use? Associated with this question would be issues relating to content ownership and Copyright legislation which aims to protect the rights of the creator or owner when content is moved or copied from one place to another.
|
|
Alongside the enormous potential of the web as an information resource, the relative scarcity of funding for research and teaching demands that we spend extremely wisely when investing monetary resources from research councils and other fenders in the creation of online resources. Inevitably, we should consider how best to maximise the impact of funding dedicated to web resource creation and dissemination and to avoid duplicating effort by rigorously promoting high quality online resources. The full realisation of this objective would only be possible if we could be assured that investment in online resources will not be rapidly undermined by changing web standards and technologies.
|
|
Objectives  |
|
The heterogeneous types of web content, creators and/or owner and aspects of quality will all be considered here. These guidelines, primarily aimed at the creation of sustainable online resources, address issues which arise at the very start of web resource creation and also those which are faced by authors and content owners at the end of the funding period. It is to be hoped that research council funding will result in many high-quality and useful web resources (although it is neither a necessary nor sufficient condition!). It is therefore strongly desirable to have a strategy in place for the preservation of the web resource in a way which remains accessible and valuable to users before the team moves on to another project or even alternative employment,. These guidelines are intended to address exactly this challenge.
|
|
Specifically, this document offers to (a) explain, (b) standardise and (c) streamline the process of creating high quality online resources from ESRC-funded projects, with a particular focus on research methods. This is not a technical document and in many areas where there are already excellent published guidelines and standards we will direct the reader to these.
|
|
More generally, our aim is to raise awareness amongst web resource project proposers, researchers, authors, editors, contributors and students about the importance of preservation of their valuable web resources before they venture on unplanned publication in the enormous unstructured ocean of the World Wide Web.
|
|
Who is it for?  |
| This handbook of guidelines for sustainable web resources follows standard web resource creation principles and also draws on practical experiences of restoring ESRC-funded online resources in the ReStore web repository project. It includes specific recommendations for social science researchers who are about to create a web resource having secured ESRC funding or who are considering options for the preservation of their online resources following the end of the initial project funding. Such researchers may include postgraduate students, research assistants/associates, teachers, project investigators, authors and co-authors. There are issues here which are (or should be) important to anyone who contributes to the creation of online
content- if for no other reason because they are contributing intellectual property, the ownership of which will be an issue for anyone who attempts to work with the resources after they have moved on. Having gone through the relevant sections of this document, the reader should be able to:
|
-
Find the information required to plan and create a variety of types of web resource
-
Understand the major issues impacting on the sustainability of web resource development
-
Understand the basic principles governing intellectual property in web resource creation
-
Plan in advance the actions necessary for subsequent restoration of their own web resources
-
Add value to their work from the outset by planning for its long-term availability to other researchers
|
|
Why sustain web resources?  |
| Here we will consider the nature of online resources, the risk of deterioration, and the challenge of actively preserving a resource before it's too late. By sustaining a resource we aim to make sure that it remains available in the same fashion to users beyond the original project funding period. It is in this context that we have chosen to adopt the term "Restoration", which refers to the preservation, updating and maintenance of a web resource for a specified period of time. This goes beyond merely "preserving" an exact snapshot of a resource and embraces dynamic curation of the site, adapting its content and presentation as necessary to ensure continued utility to users.
|
| How and when to sustain?
|
| Before we proceed any further and discuss the "how" part of sustainability, it would be helpful to take as an example a typical research council-funded research project which produces online resources as part of its activity. ESRC invests heavily in research methods projects which create, as part of their activities, online training and resource materials, often with considerable interactive or reference-value content. Typically, the development of an on-line resource is time-consuming and expensive and the full value of the resource only comes into play close to the point at which funding ends. The example of an ESRC-funded project which advances a particular research method and produces online tutorial materials for the guidance of other researchers is presented graphically in Figure 1.
|
|
|
At the start of the project there is no online presence and user awareness of the research is low. As the project team present their work at conferences and create an initial website, user awareness increases but the utility of the actual online resources is not realised until the end of the project when the content of the website is complete and the resource is widely publicized. The resources are likely to be extremely useful to researchers, including postgraduate students, mid-career academics and experienced research staff in the commercial and public sectors.
|
|
The online resources reach their peak utility at around the time that funding ends but user awareness continues to increase as the materials are cited in other publications and presentations and also spread by word of mouth. Unfortunately, it is during this period of peak usage that, without any follow-on strategy, the online resource begins to decay. Perhaps a key textbook is published to which it makes no reference, a government department is renamed and a series of important URLs become broken or a new release of widely-used browser software does not correctly render illustrations within the resources. These deficiencies could all be readily addressed but the researcher has left and the investigators must prioritise other funding applications. The size of the gap between the user awareness and the quality of the resource, shown as area A in Figure 1, represents the "missed opportunity" for a return on the original investment - in this case missed impact for both the project team and ESRC. Without further action, the site will become unusable and cease to be recommended or used. These issues are not addressed by static web archiving projects which effectively freeze online resources at the peak of their utility, but take no action to moderate the decay of quality over time and do little to increase user awareness.
|
| What are "Sustainable web resources" all about?
|
| Sustainability generally refers to something having the quality of lasting for longer due to its resilience and robustness. A web resource in this sense is termed sustainable if it stands the test of time and maintains its original shape with the least possible human intervention. The preservation and upgrading of a typical web resource ideally needs to be addressed before its content are outdated, its links become non functional or it ceases to be presented appropriately on users' web browsers. The need to sustain a web resource could be driven, for example, by the fact that it supports an active user base, contains valuable or definitive information, analysis software, research publications etc. The popularity of most resources declines when the resource creator stops updating and maintaining them and their deficiencies multiply and become more apparent to users. The challenges relate to technical, academic and organizational/legal aspects and a resource cannot usually be made sustainable without addressing all three of these areas in parallel.
|
|
Some online resources continue to be maintained by their original creators or their institutions, especially where they are the work of well-established institutes or research centres that support a broad portfolio of online material within a particular field of research. This is an entirely acceptable solution providing there is some guarantee of longevity and quality management. Most of the basic considerations raised in this document will continue to apply to such resources.
|
|
The problem of how to sustain online resources beyond their initial period of funding is one that has been long-recognized, for example by the Digital Curation Centre (http://www.dcc.ac.uk). An initiative called the UK Web Archive launched by the UK Web Archiving Consortium, has been in the business of preservation and curation of web resources mainly representing subjects of cultural, societal, religious, political and scientific significance to the UK. Common to these initiatives is the fact that they archive web resources in their current form without explicitly addressing the challenges of ongoing maintenance. In some cases, this solution may suffice but it can be problematic in situations where web pages are generated and updated dynamically. Search functions, navigation and external links are particularly prone to failure under this approach. This approach is well-suited to resources which provide reference information that does not change over time, such as metadata or instructions relating to a specific version of a dataset - in this sense it is more akin to the archiving of research datasets.
|
|
More frequently however, online resources become unserviceable because content updating has become necessary to ensure continuing utility, or where resources have been created using specific software tools or web standards that are superseded. It may be expected that they will cease to be serviceable over time, even though their core content remains valuable to researchers. Once funding has ended and project teams have dispersed there is little opportunity to update or maintain resources with the consequence that valuable materials are lost to the research community.
|
|
It is in response to these challenges that ESRC funded the "Sustaining online resources in research methods" project now entitled ReStore: a sustainable web resources repository. (http://www.restore.ac.uk) As the name suggests, this type of active restoration comes in to play when active development of the web resource stops. The average lifespan of a web page is 44 -75 days [3].The very short lifespan of online resources, if unmaintained, suggests that active restoration and preservation need to begin very soon after the end of funding of a web resource project if maximum utility is to be retained. This requires a clear framework for identifying resources that are appropriate for restoration and the tasks that are required. Many of these are greatly facilitated if they have been anticipated earlier in the project; hence it is most certainly possible to make a web resource more "sustainable" by planning for this phase right from the outset. Questions for consideration include:
|
- Identification of "what" and "what not" needs sustaining (e.g. methods and principles do, project news items do not)
- Identification of any inappropriate web content (e.g. content for which copyright permission has not been obtained)
- Identification and remedy of technical problems (e.g. broken links, incorrectly rendered images)
- Gathering contact details of team members, contributors
- Readying 3rd party consent documents, software licenses and ad hoc contract documents
- Gathering details on technology transfer and relevant institutional involvement (generally the creator's employing institution will be the owner of the material and a technology transfer office or legal advisor will need to be involved)
|
|
| Some practical issues: experience from ReStore
|
|
Experience with the ESRC ReStore project allows identification of types of online resource that are particularly requiring of investment in sustainability:
|
- Resources that will not 'work' unless updated and maintained. For these resources a lack of up-dating means that some or all of the investment would be lost. Typically examples are where web resources relate to a particular analysis package that routinely changes, eg SPSS, STATA, NVIVO. In some cases exemplars will no longer run if users are not working with the same software version as the web resource. In other cases the software now features a new development and resource pages which do not reflect this become unhelpful.
- Resources that provide a coherent and integrated web-site with a large amount of material where a failure to up-date would quickly result in poorer quality and consequence fall-off in value. Many resources require links to other pages that, if no longer available, will quickly reduce the value of the resource. In some cases these resources will continue to be updated by the authors because they have a continuing work programme in that area, but this is unlikely to provide a coherent maintenance strategy.
|
|
Other considerations include the question of access control and that of for how long a resource should be maintained. It is recommended that wherever possible, standalone resources should not be restricted to certain users, for example by using federated access management or localised username and password systems. Not only does this reduce the user base of the resource but it can create extremely complex maintenance implications, requiring continual security updates, adaptation to reflect broader access control technologies and active user management. It may be essential to protect specific content but in this case it is far better to consider placing the resource with an established archive or data centre that is able to support user access controls on an ongoing basis.
|
|
It is generally only practical to guarantee the active maintenance of online resources for fixed periods of time - it is recommended that ongoing maintenance be reviewed after no more than three years to ensure that the resource is still serving a user base and is not in need of more major updating or reworking. Factors that need to be considered include: the likely cost of further maintenance; the level of use of the resource; whether there is likely to be a change in technology that may either make the resource redundant or would require such a fundamental re-write that it would not be cost-effective. If a well-maintained resource ceases to merit active continuation that may be an ideal time to consider its transfer into a static web archive.
|
|
What is a web resource all about?  |
| Here we consider the nature of online resources, the risk of deterioration, and the challenge of actively preserving a resource before it's too late. By sustaining a resource we aim to make sure that it remains available in the same fashion to users beyond the original project funding period. It is in this context that we have chosen to adopt the term "Restoration", which refers to the preservation, updating and maintenance of a web resource for a specified period of time. This goes beyond merely "preserving" an exact snapshot of a resource and embraces dynamic curation of the site, adapting its content and presentation as necessary to ensure continued utility to users.
|
| Simple project sites |
|
These contain a few static web pages and typically list administrative details about the project, in much the same way as would be used in a grant application form or final report. They set out the membership of the team, investigators, researchers etc., contact details, aims and objectives, perhaps a few presentations or working papers, useful links. Much of this detail is also required by the funding body for inclusion in their own website e.g. ESRC Society Today (http://www.esrcsocietytoday.ac.uk). Although useful during the lifetime of the project, these sites contain limited academic content and the key information is likely to be retained elsewhere on the web. Such a site may be a potential target for an archiving initiative for various purposes but the direct benefits to the end user are limited. Such web site would normally fail to meet the criteria set out for the ReStore repository in Appendix A.
|
| Reference sites |
|
These may contain all the information from a simple project site but also contain a library of some important reference material - perhaps datasets or publications resulting from the project activities. These materials can be of considerable academic value, although unless the project has had a very high-profile, they are unlikely to provide the most visible location. Funding bodies are generally likely to require the deposition of important datasets in some form of archive (e.g. the UK National Archive [4] for major ESRC-funded datasets or UKDA Store [5] for self-archiving of smaller collections). These major repositories are effectively very large reference sites. Similarly, there is increasing interest from universities in the collation of all their staff's publications into institutional repositories and ESRC require the inclusion of publications arising from the projects they fund within ESRC SocietyToday. In addition to publishers' own websites and library catalogues, publication details are becoming increasingly searchable and retrievable by multiple routes. A final consideration is that the original author will rarely have the rights to put copyrighted material on their own website for general distribution, hence the most prestigious outputs cannot readily be shared in this way. Although useful for work-in-progress, project-specific sites rarely provide the best long-term containers for important research materials, which are better placed into well- recognised, maintained archives and repositories.
|
| Resource sites |
|
These sites generally result from projects with a methodological development/training and capacity building focus and at least a part of their aim is to provide training in research methods or techniques. Such sites may be assembled either as a series of static pages or with some level of interactivity facilitated by use of a scripting language and/or database. By their nature, these types of resource may contain a variety of content including relevant publications, sample data, quizzes, presentations, etc. and use a variety of media types. These sites are not readily placed within publications repositories or data archives, nor are they easily assimilated into research project catalogues. Their utility is greatest when they are readily accessible and well-maintained as users seeking to learn new methods will be easily put off by broken functionality. This latter category is the particular concern of the ReStore project - and of these guidance notes. Essentially, ReStore aims to provide a well-recognised, maintained archive for precisely this type of online material.
|
| Personal web sites, blogs, twitter, etc. |
Personal web sites, as the name suggests are created by individuals and generally contain personal information, interests, opinions, beliefs, available on the web. Traditional personal pages are technically no different to the site types listed above, but numerous additional web formats will be found in use, particularly blogs (literally "web logs") in which the author records their thoughts or views sequentially by adding small snippets of information (or in the case of twitter, extremely small snippets) for sequential publication on a web page. Where an academic researcher is well-known and publishes high quality content by any of these means, they may be of significant academic interest but generally share the same weaknesses as the sites noted above, in particular their liability to fall into disuse if that individual should move institution or otherwise cease to maintain the entire project to a high standard.
Blogs are a ubiquitous component of online life, having emerged in recent years as a pervasive, interactive medium for communication and information dissemination.6
Such web site besides having attributes and components of other web sites, maintain an active connection with a remote or local database for displaying content as per users' inputs. In other words a web site called a deep web site when some of its content are generated on the fly after specific request have been submitted to it by users such as searching for a product, datasets or book title. A typical example would be UKDA store (http://www.data-archive.ac.uk/) where large collections of datasets and other digital objects are kept which can be accessed through web pages having an active connection with that database.
|
|
|
Alternative initiatives focusing on sustianing online resources  |
|
In this section we will briefly review the merits of alternative initiatives for preservation, encouraging the author to consider whether any of these would be appropriate for their own resources.
|
|
Maintenance by universities
|
|
To include consideration of repositories maintained by individual universities and the potential advantages (or requirements) of placing project outputs in these.
|
|
Maintenance by a commercial company
|
|
In general it is unlikely that a methods related resource would be adopted by a commercial company and made freely available to users on-line. However, there may be some exceptions with respect to software companies where, for example, an on-line resource to provide software training could, conceivably, be adopted and maintained by the software house. and made freely available on the web. Although resources that promote specific commercial software packages tend not to be funded by research councils, there may be some situations where adoption by a commercial company is an appropriate option. However, these situations are likely to arise rarely and will depend heavily on individual contacts and agreements. |
|
Maintenance by a community of users
|
|
There are various instances of academic research projects forming the core for an ongoing online community, which effectively self-maintains a set of online resources. Users who benefit from a specific resource (e.g. they direct their students to it) and also have a direct interest in the content (i.e. it maps onto their own research interests) may be willing to adopt responsibility for its long term maintenance. Building a community of interest could be facilitated by setting up an advisory committee, at the point of project funding, which represents that community. Long-term maintenance could be recognised by providing public recognition of the quality of work being undertaken. (This could take a number of forms and is not discussed in detail here.) The "community of users" may be a viable option for some resources. However, it requires a continuing level of commitment from a small number of core people. The CD-LOR (Community Development of Learning Object Repositories) project investigated these issues in more detail, particularly examining the characteristics of learning communities (http://www.academy.gcal.ac.uk/cd-lor/learningcommunitiesreport.pdf). It is perhaps significant that the community-building and user engagement activities which appear to be necessary prerequisites for success in this area are not readily developed as part of the type of ESRC-funded research and development projects which create the online resources of interest here. The alternative option of using a wiki as a tool that would enable anyone to contribute or edit material was not seen as providing sufficient quality assurance by those consulted. Successful communities are either very large (e.g. the Linux open source software project) or underpinned by commercial software vendors who incorporate community-generated content (e.g. STATA). |
|
The role of repositories like ReStore
|
|
This section identifies the position of ReStore within the broad range of options discussed here, and stresses in particular the funding model and the importance of ongoing collaboration with the original resource authors.
|
|
Initiatives like UK Web archive, UK Data Archive and others focuses mainly upon preservation and Curation of web content and after having preserved the snapshot of the web resource, they almost forget about the creators and/or authors of the webr resource. ReStore makes a difference in this regard by ensuring collaboration with the authors before and after the ReStoration in order to keep the web resource up to date and fully functional. This extra efficiency surely comes at a cost but it fulfils the very basic criteria of web resource preservation in its true spirit. It also helps in maintaining the resource by publishing up to date content over it with the collaboration of the original creator.
|
| Proposer vs. funding provider
|
| The rapid and most frequent contraction in today's financial system forces various Governmental and non Governmental funding bodies to think carefully and stringently before sanctioning and releasing funds to a fund seeking organization. In most cases, a proposer has to articulate his point beyond the more common proposal jargons i-e cost, benefit and risks and include more about impact and value of the outcome. The impact could either be assessed in tangible and/or intangible ways which may involve a finish web resource, document repository, knowledge, experience gained etc. when it comes to web resources, creating them has become far easier in today's world of social networking and web communities. How to preserve valuable content and how to evolve a strategy to ensure what is produced will be ultimately preserved selectively, is however the most daunting task discovered so far.
|
| ReStore apart from the very basic idea of collecting or harvesting web resources has gone further towards chalking out a strategy which aims to help proposer or fund seeker include the importance of post funding scenario in their proposal in order to stress the "value" and "impact" from the very beginning of the project. This approach would not only instil the sense of responsibility into investigators, developers and managers but would also make sure that the trust-worthiness and reliability of content being created remain intact. An online resource being created, with preservation in mind at the start of the project, would save time and resources currently required to restore a web resource into ReStore repository after the cessation of initial funds. This in return will augment the impact of the finished web product at a very low cost thus giving a very high return on investment to fund provider like ESRC. The aim of including this section into this document is to stress the importance of web resource preservation before even the resource is created in the first place. Some of the areas, the ReStore team would suggest may include:
|
|
Assessment of web resource content quality  |
|
In this section we review the criteria that can help to determine whether or not an academic web resource needs to be sustained and what work will be involved. Clearly, investing in maintenance of a web resource which is of no practical value or significance to researchers would be a waste of time and resources. Factors which have to be considered before deciding to actively sustain a resource include the following:
|
- Does the resource have an active user base?
- Are the contents of the web resource being used and referenced by researchers and students as part of their academic activities?
- Are the contents of the resource of high quality and up to date?
- Have the developers and investigators taken sufficient care to avoid copyright infringement while uploading content, research papers, software tools and datasets?
|
|
Just because there may be problems with some specific aspect of a site (for example, a particular area which has become out of date), this does not mean that it should not be sustained. Rather, the answers to these types of question can help determine whether the benefits of restoring and sustaining the resource outweigh the likely costs.
|
|
| Appropriate content referencing |
| For academic resources, much the same principles regarding referencing apply as to conventional academic journal and book publication. Firstly all sources should be appropriately acknowledged and key, up-to-date academic sources should be cited. In addition, it is particularly important to be sure that no third party content has been included in the resource without the permission of its original copyright holders (something we consider in more detail in the section of these guidance notes concerning Intellectual Property Rights) and also that references to other online materials are properly described and correctly working. One of the greatest sources of frustration to users of online resources is to find that interesting links do not work. It follows that there is a significant maintenance burden associated with ensuring that any web resources is regularly checked for broken links and that these are promptly corrected or replaced with appropriate alternatives.
|
| Quality of academic content |
| Again, the academic content in a web resource should strongly reflect expected academic publication standards for conventional academic publication in the equivalent field. The quality of content on a web page is essentially what determines its potential value to other researchers, although there is great scope to harness the power of the web to present academic arguments in ways which are simply not possible through conventional publication, for example through use of interactivity and combination of a variety of media. A high standard of writing, such as would be expected in a refereed journal paper, is equally applicable online. Particular areas requiring care are spelling and grammar, paragraph length and composition, lists (static and drop down), sentence length and contextual relevance etc. It is very easy for authors whose writing medium is not primarily online to become distracted by the technical requirements of web authoring and to inadvertently produce content which is of a lower standard than they would apply to conventional academic outputs.
|
| Adoption of a clear message and perspective across the entire web resource usually helps to establish a good affinity with the user of the materials. Because web resources are not subject to a regular framework of academic review and associated indication of quality that comes from publication in a recognised book series or highly-cited journal, the web resource user must rapidly form their own judgement about the quality and reliability of the resource. Although web pages comprise many different elements such as text, images, navigational buttons, menus, etc. but it is still the textual composition which is particularly influential. For example, search engines cannot read graphics or parts of a page containing client side scripting code (please see the section on standards for further details) and are therefore wholly reliant on the textual content. Well thought-out keywords and keyword phrases make a big difference in conveying a positive message to users about the authenticity of content and capacity of resource author's knowledge on the subject.
|
| Part of the strategy for producing quality content on a web resource site should be adherence to "quality first and quantity second" rule. Application of such a rule should result in providing users with a satisfying experience and encouraging them to return in the future. A good question to consider is whether the material is of sufficient academic and presentational quality that an interested researcher is likely to want to bookmark the site. Another factor which clearly contributes to the quality of content in a resource site, is to ensure that contents are updated regularly and links to external web resources (URLs) reflect updates in the linked resources (e.g. data releases, software versions, etc).
|
| Consistency of quality |
| Content quality must be exhibited consistently throughout a web resource site. It frequently happens that the home page draws much more attention from resource creators and enthusiasm fades away when it comes to the more detailed pages. The primary reason perhaps may be the idea that users always stake start from a home page but this is not necessarily the case: it is entirely likely that search engines will deliver users directly to a page deep within the site which appears to contain topical content. Thus every page potentially has to function as a first entry point to the entire resource.
|
- Ensure that all content on your site are properly reviewed and tested before they are uploaded.
- Ensure that the global web site design and style is followed on every page
- Adopt a single standard practice for hyperlinking to internal and external web pages. Maintain the same colour and text style for such links throughout the site.
|
| Frequency of content updates |
| Frequently updating content on a web resource site (e.g. every few weeks or months) keep will encourage users to return. Updating should take into account activities such as:
|
- Regular random checks of user interface components such as buttons, home page link, footer links, and links embedded in obvious menus etc.
- Replacing outdated links with updated (internal and external) links. The use of Google Webmaster, a free tool for monitoring all links in your site, will be of tremendous help to carry out this kind of updates. For more information, please see www.google.com/analytics/
- Updating links to any internal and external software download pages
- Updating links to publications i.e. replacing old versions with newer ones
- Reviewing suggestions from users (normally received through an online contact or feedback form) and taking action by updating content or links on the site
- Redirecting web pages to new pages in cases where the older one has been deleted or is no longer valid (usually for external sites)
- Carrying out multiple browser compatibility tests on a regular basis to ensure that web pages are rendered correctly in the major web browsers. Such testing is necessary because browser software is regularly updated by their developers, which in some cases may result in changing the look and feel of web pages.
- Updating text, images and other content if identified as necessary by regular review of the site
|
| Appropriate description of artefacts |
| In web terminology, anything usable by users of a web page (e.g. links, clickable buttons, menus, etc.) are referred to as artefacts. It is essential that such elements are labelled and presented in a way which clearly describes their purpose and encourages the user to make use of them. Thus menu items or buttons whose purpose is not entirely clear are unlikely to be effective in drawing users in to additional material and functions.
|
| Superfluous material |
| Authors used to writing academic text may find it tempting to create materials which are over-long for online use. Key considerations should be to seek and remove all superfluous materials and to ensure that everything presented is clearly structured, particularly avoiding long passages of unbroken text. A simple device for reducing superfluous material in the web resource may be to provide a link to a more lengthy conventional document which can be downloaded and studied separately by the user who prefers to work in this way. It should be assumed that web users will rapidly move on to another site if they feel that the page they are reading has wandered off the topic in which they are interested. Also, the inclusion of significant superfluous material may reduce the effectiveness of indexing by search engines and thus reduce the overall number of visitors to the site.
|
| Content typography |
| Good typography improves the organization and aesthetic appeal of text and other artefacts on a web page making it increasingly legible. Research users will generally be most interested in the core intellectual content of the site, as conveyed especially through its text. The resource author should therefore give consideration to at least the following elements:
|
- Language of the content
- Typesetting e.g. creating headings, captions, bold/normal texts, simple and drop down lists etc.
- Hierarchy in content e.g. prioritisation of content to make certain parts appear more prominently and arranging content hierarchical categories
- Font e.g. which font size should be used for a heading, sub heading and normal text in a web page
- Layout e.g. intertwining typesetting with other graphical elements such as images, logos, video files and other illustrations
- Colour e.g. colour of text, main and sub headings, hyperlinks, page and menus backgrounds and foreground colours etc.
- Rhythm e.g. the structural arrangement of the various artefacts in a page and overall web site
|
| Web resource usage statistics |
|
The collection of web resource usage statistics is a key to understanding users' activity on a web site. Such statistics form a key component of any consideration whether a resource merits further investment in maintenance, for example through the ReStore project. The following are some commonly used, freely available usage statistics tools of the type which we would encourage resource authors to use from the outset of their respective web resource creation projects:
|
- Google Analytics
- Google wWebmaster & Google Sitemaps
- AWStats
|
|
Google Analytics
|
|
Google analytics will provide insight into user activity and web site traffic on daily, weekly, monthly and yearly basis which is quite sufficient for the needs of most academic research resource sites. It is straightforward to install and configure and easy to use, providing a range of activity reports. For details, see www.google.com/analytics/..
|
|
Google Webmaster and Google Sitemaps
|
|
This tool is in many ways an extension to Google analytics as it can aid in improving web resource site traffic which will be reflected in analytics report. The tool also helps in recording users' actions on the web site such as keywords searched, listing functional and non functional links, showing errors in various web pages when they were requested by users, etc. For details please see http://www.google.com/support/webmasters/.
|
|
In order to feed this tool with all the URLs in a web site so that it can track of activity, Google Sitemap file is widely used. Google Sitemap is based on an XML file containing all URLs within a web resource site (including those normally hidden from search engines), using Google Sitemap Protocol. All of the URLs contained in the file are available for crawling by search engines, which ultimately raises the usage ranking of a web resource site thereby increasing its visibility to potential users. Once a Google Sitemap has been manually created, it is submitted to the Google Webmaster tool. For details ofhow to create and use Google Sitemaps, see http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156184
|
|
AWStats
|
|
Unlike the Google-based site statistics tools, AWStats requires resource developers to have some technical expertise in web server configuration. It may also require a basic understanding of web programming in some widely used scripting languages such as PHP, Perl etc. For more details on using AWstats, see http://awstats.sourceforge.net/.
|
|
Principles for sustainable web resources  |
|
The sustainability of web resources is greatly enhanced if they are well structured and technically stable. Not only does it make these sites easier to maintain in the long term, but these are of course also characteristics that will endear them to users. The purpose of this guidance is not to replicate the many relevant standards that have been published and are available elsewhere, but to highlight principles that should be considered by the author of any research methods resource. Many of the themes covered here have been identified through our experiences working with creators and users of ESRC-funded research methods resources and we particularly emphasize issues relevant to the authors of such resources. Although many of these principles may seem obvious to readers who will be very familiar with using the web, it is surprisingly easy for creators of academic sites to overlook some of the basic standards of web design, especially where this is being combined with authoring the academic content which is naturally the key focus of attention.
|
|
Getting this right may not appear to be a distraction from an academic project, but getting it wrong may well prevent users from staying on a site long enough to reach the academic content! It will be seen that paying due regard to web standards can in many cases improve the suitability of resources for long-term preservation and improve their indexing and ranking by search engines.
|
|
Our starting assumption is that the reader already has a good idea of what the academic content of their site will be and understands that web pages are assembled from instructions written in hypertext markup language (HTML) which can be created using a wide variety of software editors. We do not assume the reader to have an extensive grasp of HTML.
|
| Web site Accessibility |
|
"The power of the Web is in its universality".
Tim Berners-Lee, W3C Director and inventor of the World Wide Web
|
|
The term accessibility has more than one meaning in relation to web resource design and development. In general, the term refers to reliable fault-free access to web content regardless of the software, hardware, operating system, etc. employed by the end user. In addition, a fully accessible web resource will be able to deliver its content to users with disabilities in order that they are able to benefit equally from use of the web.
|
| Usability Standards |
|
In simple terms, the usability of a web site reflects how easily and efficiently a user can navigate through the constituent pages. Achieving good usability standards requires creators to come up with content that is engaging, appropriate and relevant to the primary users of the site. Thus it should be clear that usability standards are not merely about navigation, visual design, functionality or interactivity. Nevertheless, a good user interface does play a central role in enhancing the usability of content, thereby increasing value and creating impact.
|
| Consistency in design |
|
The human mind constantly searches for patterns and this is especially the case when exploring a web site. If no consistent pattern is apparent, the user is soon likely to look elsewhere. Consistency in design involves much more than presentation style, extending to organization of content and the entire experience of user interaction. Design of a single web page or entire site should always keep in view the intended primary users. Once finalised, the design should be replicated across the entire site, usually using a single Cascading Style Sheet (CSS) file which controls the appearance of every page and avoids users becoming distracted by variations in style.
|
|
A well-crafted style should still be flexible enough to accommodate necessary changes such as the introduction of a new link or button on every web page on the site. If the design is such that a developer or author has to go to every single page to insert the code for new link or button, this will be highly inefficient and will often lead to errors and inconsistencies. On the contrary, if the style and layout files are kept separate from the page contents, such changes are relatively quick and simple.
|
| User-driven navigation |
|
In any web site the structure and organization of content plays a pivotal role in guiding users around the site. The structure encompasses the formation and placement of link vocabulary (link titles, names, phrases, etc); availability of core links on every page to facilitate easy navigation; visibility of each navigational entity in individual pages and across the entire site, and flexibility to accommodate future changes. This is by no means an exhaustive list but indicates factors which deserve careful attention at the design stage. Several measures may be adopted to enhance navigation such as construction of a well-structured main menu (typically including home, contact details, an accessibility statement, terms and conditions, etc) and a detailed sitemap, which are made available from all the core pages. Such devices help users to understand what content is available and where to find it. An omnipresent main menu can give users an immediate impression of the coverage and depth of a site, helping them to see what is on offer and encouraging them to explore further. Some of basic navigational elements are:
|
- Linking back to the home page from every page on a site
- Displaying a "breadcrumb" trail which always clearly shows the user where they are within the site structure (e.g. "Home > Contact us > Complaints department").
- Placing page jumps which link the various sections of a single page, aiding exploration of more lengthy pages
- Having an omnipresent facility to search across the entire site
|
| Memorable layout |
|
Alongside making a web site usable, making the way it works memorable and consistent are other important characteristics. Ideally, a user should be able to learn their way around a site and memorise its navigational style in one, or very few, visits. If they have to keep relearning the function or location of links, buttons, menu items, or widely varying page layouts, they will most likely not return. This will decrease the impact of the original development and its academic content. Working with an inconsistent site design is analogous to cooking in a kitchen in which there is no governing logic for where utensils are kept, and continually having to learn, for example, that forks are not in the same drawer as knives, or essential ingredients are scattered across a variety of cupboards!
|
| Memorable URLs |
|
A Universal Resource Locator (URL) in the form http://sitename.ac.uk/filename.html comprises three parts, namely the protocol (http), domain name (sitename.com) and file name (filename.html). URLs which are simple, short and meaningful can be easily memorised and shared among a community of site users, and may make the difference between someone telling another user about the site or not. The URL of a web resource should not be longer than 78 characters to avoid wrapping across a line feed inside an html editor, email message or browser. Shorter URLs are easier to spell and people often directly type them into their browser rather than accessing them from bookmarks or searches.
|
|
A web resource creator should take care to choose a short, meaningful domain name and file naming convention. If registering a new domain name, consideration should be given to finding a name will ideally tell users as much as possible about the site and its content. It should be descriptive, meaningful and free from jargon or special characters (e.g. *_ -£#@?>< etc.) Using spaces between the characters in a domain name is also strongly discouraged. Some basic guidance on naming files is provided later.
|
| Web browser compatibility |
|
A web browser is simply a software application used for viewing web pages on the Internet. The most commonly used web browsers are currently Internet Explorer, Firefox, Opera, Google Chrome, Flock, Safari, etc. In order to be confident that a web page will be displayed correctly on a user's web browser, browser compatibility tests must be carried out before uploading the page to the web site. We strongly recommend that web resource creators test every single web page in multiple browsers as and when pages are created and uploaded into their sites. All of the above browsers are available for download free of cost and can be installed with minimum features to conserve disk space usage. Academic resource authors should bear in mind that the leading academic in their field, or indeed those charged with reviewing the outcomes of their project, may be dedicated enthusiasts of a different web browser to themselves!
|
|
A typical web page consists of many elements e.g. text, images, audio and video files, style sheet, JavaScript code snippets, etc. which the user's browser interprets and displays as an integrated whole. However, in some cases, a browser may not be capable of interpreting one or more of these elements. For example, it may fail to interpret JavaScript code because the user has not enabled JavaScript reading in their browser setup. If this is the case, the browser will simply skip that part of the page and would render the rest of the page with a message indicating potential errors. It is therefore strongly recommended that developers using JavaScript come up with parallel server side scripting. Since this runs on the server rather than the browser, the probability of serving an incomplete page will be substantially reduced.
|
| Character encoding |
|
Information within computers is stored and transmitted as bits (binary digits), generally grouped as bytes, which must be converted to characters before being readable by the user. This applies equally to the content of web pages: it is a series of bits that are sent to the user's browser and not the characters which we expect to see on the screen. These are assembled into recognisable characters following a process called Character Encoding (CE), which is the set of rules telling the browser how to convert the bits and bytes to characters. There is more than one such standard, such as ISO-8859-1 which is usable for most West European languages. UTF-8 is another CE standard using different number of bytes for different characters. Detailed explanations are not necessary here, and the user will find more details on the W3C web site at http://www.w3.org/TR/html4/charset.html.
|
| Which encoding to use and where? |
|
CE is either automatically set in a web server configuration file or the web resource creator can insert a CE directive in the header section of their HTTP web pages. Regardless of any CE instructions sent by the server, the browser will render the page on the user's computer using the specified CE and it is always a good practice to add the directive regardless of whether or not the web server has sent a CE directive. The line required for manually adding CE into the section of a web page is:
|
Place the following line between <"meta"> <"/meta">
meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"
Or
|
Place the following line between <"meta"> <"/meta">
meta http-equiv="Content-Type" content="text/html; charset=utf-8"
|
|
Based on our experience of ReStoring various web resources, we would recommend the use of ISO-8859-1 for all of web pages unless there are specific reasons for doing otherwise.
|
| Web Server software |
|
A web server is here referred to as the software system responsible for processing user's requests submitted from a web browser: it is this system that acts as the container for the web site. Many academic site developers will have little awareness of the setup and configuration of the web server they are using, because it will be maintained and operated by dedicated IT staff within their institution and they may have to do little more than place their web pages in a specified folder. However, there are aspects of the server which are worthy of consideration here and there may even be instances where the requirements of a specific project merit the setup of a dedicated server.
|
|
A server delivers web pages (usually developed in HTML) and associated content (e.g. images, style sheets, JavaScript) to clients. If a web page contains elements such as programming script e.g. Java Server Pages, ASP (Active Server Pages), VB Script, Perl, etc, then these are processed by the server before being displayed to users in their respective web browsers. Such a page is generally termed a dynamic web page. An online HTML form where a user enters their contact details is a typical example of a dynamic web page. This may result in both the display of a welcome message in their browser and the storage of their records on the server. These types of pages cannot be created without some understanding of the server. No detailed consideration is given to these systems here, but resource authors should be aware that institutions will most likely have specific policies about the types of web server (e.g. Apache Web Server, IIS (Internet Information Services), Apache Tomcat) they are able or willing to support. There are licensing, security, maintenance and support implications of different server decisions and resource authors are therefore recommended to discuss their requirements with their local IT service provider at an early stage in order to understand the options available to them and whether there will be additional costs involved.
|
| Use of Cascading Styling Sheet (CSS) |
|
CSS is a simple mechanism for adding style (e.g. fonts, colours, spacing, captions, titles and special effects) to web documents. Different CSS style elements may be applied to an HTML document in order to make a web page appear in a user's browser in the way the developer intends. However, some CSS style properties are interpreted differently by the different browsers. This is one of the reasons why we recommend testing all web pages during development in all the major web browsers after any CSS has been applied. It is also strongly recommended to use a single CSS file to control the look and feel style of entire web site. By using a single CSS file, any style-related anomalies in a page or across the whole site can be fixed after making modifications in just one place. Much more extensive details are provided about CSS at http://www.w3.org/standards/webdesign/htmlcss which offers up to date tutorials and other learning materials for developers from beginner to expert.
|
| CSS vs. Frames |
|
Frames are another way of adding a style to a web page or embedding one page within another. Using CSS is slightly more difficult than implementing frames but is the correct and most effective way of controlling the style of a web resource consistently between different browsers. Frames are also ignored by leading search engines which means that site content inside frames will not be indexed, thereby reducing the chances of potential users finding the site.
|
| Modularisation |
|
Modularisation is the splitting of different sections (e.g. header, body, footer, etc.) of web pages and saving each of them with a unique file name for the sake of good maintenance and updating throughout the site. Modularisation is a good practice which we would strongly encourage web resource authors to follow. One way of achieving this is to develop web pages using SHTML, which is similar to HTML except that such pages are assembled on the server and not on the client side browser. (In SHTML, the S stands for Server Side Include). An SHTML web page easily lets you "INCLUDE" other pieces into HTML using special directives such as "". For further details on server side includes, please see http://en.wikipedia.org/wiki/Server_side_include.
|
| File naming |
|
Frequently the naming of files during web resource creation is undertaken without any overall guiding conventions being adopted. It would be good practice to consider the following questions:
|
- Is the file name meaningful and does it reflect the content of the file?
- Is it memorable and ideally no more than 10-15 characters long?
- Is there any reason whether it needs to use upper case, lower case or both? The implications may vary according to the type of server being used
- Does it really need to contain any special characters such as underscore (_) or hyphen (-)?
- What should the file extension be? html would reflect a static web page, while asp, jsp, php would reflect the scripting language used
|
|
This list does not cover all relevant considerations but should help enormously in enhancing the maintenance and sustainability of the site.
|
|
|
|
Under no circumstances should a special character such as an apostrophe be contained within a file name (e.g. filename's.htm). Such file names are not recognised by some servers and if the file name has to be included in any web scripting code, this will always cause errors.
|
|
Another area already mentioned is whether static or dynamic files are to be created. Most dynamic web page files end with php, jsp, asp, aspx, pl and semi dynamic with shtml. The decision to develop a dynamic page which will be processed by the web server and then passed on to the user's browser should be made with reference to the contents of the file. If the resource author wants to render a dynamically generated web page in the user's browser such as the web form mentioned above, then the correct filetype should be used for the scripting language and server in question.
|
|
We strongly recommend that resource authors consult their institutional IT service providers before deciding to create dynamic web pages. This discussion will bring to light the type of web server on which their resources will be running and whether the intended operations can be implemented and supported.
|
| Descriptive hyperlink text |
|
A hyperlink is descriptive when the linking text is sufficiently meaningful that the user can correctly predict the nature of the page it is linking to. Thus, authors should avoid using text such as "Click here", "Go to", "Read more" etc. as these will be meaningless when the page is indexed by search engines. Meaningful hyperlinks enable search engines to crawl and index web pages effectively, thus enhancing the overall probability that they will be highly placed in the results displayed following a user's search. Well documented web pages leave a good impression on users, thus boosting retention and decreasing deflection.
|
|
Ensuring that external links open in a new window is easily achieved (by using a "target" attribute in the hyperlink directive) and also means that the original resource still remains visible on the user's screen. It is good practice, however, to warn the user that clicking on an external link will open a new window in their browser. |
| What does a web page contain? |
|
In order for a site or a web page to be easily found by new users, the creator needs to try to ensure that the site is ranked amongst the top results displayed by major search engines such as Google, Yahoo, MSN, etc. Thus, in terms of increasing impact, the way in which the content is presented to search engines may be of far more importance than the visual appearance of the site.
|
|
Search engines are the principal means by which users find new material on the web. Developing and enriching web pages with good metadata and well formatted content can help to ensure sustainability by increasing the chances of being found by these search engines. Quality of content, organization and structure of a web page all help to establish trustworthiness of the entire site. We list below some sections of a typical web page, each of which can be designed to increase impact and sustainability.
|
- Header
- Title
- Metadata
- Content (body)
- Web programming scripts (clever stuff)
- Footer
|
| Header |
|
The header is the most important graphical element of a web site as it provides instant recognition. The header and any images (e.g. logos, organizational slogans, etc.) contained inside it will be seen more than any other element of a typical site. It is therefore critically important that it looks good on every page. It is common practice for each page to have its own header script contained in the top section. This needs to be standardised by placing the header section into a separate file and including that file in each web page using the "Include" directive. We have already discussed how to use this directive with reference to the use of SHTML.
|
|
By doing this, a developer gets maximum control over the whole site and any modification can be made once and is immediately applied to every constituent page. This type of consideration is especially pertinent to long-term preservation where a small change (for example an acknowledgement reflecting the conclusion of a project) may need to be made to every page in a site. Another method would be to control the header through the CSS file. As discussed earlier, a CSS enables a web resource author to control the overall look and feel, managing the available space in each web page so as to give a distinct identity to the site. By accessing all images, logos, etc in a single CSS file and including the CSS file in the web page (as opposed to placing links to these items individually in each page), a developer not only makes life easier for themselves but also for anyone who may come to maintain the site at a later datel.
|
| Title |
|
The title is of obvious importance because it informs users about the content of the page in a few words. Authors should review page content and take care to choose appropriate titles. Almost all search engines after indexing a web page, identify it by its title. The more meaningful the title of the page, the greater the likelihood that it will be highly ranked in search results. In the absence of metadata in a web page, the title will be treated as the first metadata item by search engines. In addition, the page title is always displayed as the title of the user's browser window, for example allowing them to distinguish between multiple open tabs and windows.
|
| Metadata |
|
Metadata is typically described as "data about data", and is a central element in the creation of a web page. It gives the user information about the page, e.g. its location, author, date of creation, copyright, intended purpose etc. There are various metadata standards which could be adopted at the time of web resource creation which immensely increase the visibility of a particular web resource in user searches. The most commonly used metadata standard is called Dublin Core. Full details are provided at http://dublincore.org. The simple Dublin Core Metadata Elements Set (DCMES) consists of 15 metadata elements, as follows:
|
|
Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage and Rights
|
|
Each of these elements is optional and a web resource author is not required to provide descriptions for all 15 elements. Any element may be repeated if there are multiple relevant descriptive details. For example, the Contributor element can be included more than once if there is more than one contributor to page content. To create metadata for a simple web page, a Dublin Core Metadata Editor is available at http://www.ukoln.ac.uk/metadata/dcdot/.
|
|
Most similar sites offer automatic metadata extraction from a web page, which makes embedding metadata into the respective pages quite easy and less technical. The Dublin Core standard is a widely used metadata standard for web resources because it offers all the metadata elements which may be required by current and future search engines in order to index, preserve and retrieve a particular web page. Good metadata also facilitates the long term digital preservation of pages using approaches such as Open Archive Initiative (OAI) harvesting (http://www.openarchives.org/).
|
| Content (body) |
|
The central part of a web page contains the actual content. This should always be placed between opening and closing body tags (i.e. <"body"> content <"/body">). In other words, if the body section is left blank, nothing will be seen on the user's screen. Attention must be given to the use of various special characters when formatting text within the body.
|
|
For example the "©" copyright symbol in an HTML file may not be interpreted correctly by browsers other than Internet Explorer. In the case of dynamic pages, web servers may not interpret the special symbol correctly. To get around this problem the HTML name for the symbol © should be used. Similarly, instead of using the symbols for less than "<" or greater than ">", the names "<" and ">" could be used to ensure that browsers render the page correctly. Almost all HTML editors offer access to these functions in their user interface menus. For further details on special character encoding please see the sections on Character Encoding. |
| Client side scripts |
|
In addition to the above, a static HTML web page may also contain JavaScript code which is interpreted by the user's browser rather than the web server. For example in many static web pages, JavaScript is included to create drop down effects when user hovers their cursor over buttons or tabs within the page. Another common use is the creation of a chained menu, created following the user's selection from the available options. All such code is referred to as client side scripting. The following are examples of situations in which a web resource author needs to include client side script in their pages:
|
- Using Google Analytics for site usage statistics (http://www.google.com/analytics/)
- Generating a dynamic menu in a site e.g. drop down links menu such as that at http://www.restore.ac.uk/geo-refer/
- Creating special effects with clicks, hovering cursor, opening/closing current files
- Online form validation
- Optimum presentation of a web page by allowing specific sections (DIV) to be selectively displayed or hidden
|
| Footer |
|
The footer, as the name suggests, comes after everything else in a web page document. The footer of a web page may contain links to "Accessibility", "Contact", "Copyright statement" and "Disclaimer" pages. Like the header, the footer should ideally be kept in a separate file accessible through an "Include" directive. In the case of dynamic web pages, such as PHP the scripting language include script would readily serve the purpose.
|
|
| Download web resource author review form Or Access it online |
| Download Academic review form Or Access it online |
| ReStore workshop:IPR & Legal considerations |
| Further readings on IPR (Intellectual Property Rights) |
|
| References |
- WebCite, available at http://www.webcitation.org (accessed 25 Nov 2009)
- "Web 2.0" refers to second generation of web development and web design. It is characterised as facilitating communication, information sharing, interoperability, and collaboration on the World Wide Web. (Wikipedia)
- Internet Archive WayBackmachine, available at http://www.archive.org/web/web.php (accessed 25 Nov 2009)
- The National Archive, available at http://www.nationalarchives.gov.uk/default.htm (accessed 25 Nov 2009)
- UKDA-Store home page, available at http://store.data-archive.ac.uk/store/ (accessed 25 Nov 2009)
|