|Digitisation- Local History Digitisation Manual|
The digitisation process has a particular impact on costing the project as it is here that the volume of items and staff time required per item can influence the overall project cost.
Stuart Lee (Lee, 92) lists four key cost variables to be considered when establishing the cost of the preparation and digitisation:
Once the variables have been established, the total cost of digitisation can be more easily estimated or alternatively a quote can be obtained from an external agency. The main cost components for in-house digitising are likely to be:
The type of digital file format chosen will depend upon both the original media of the physical object, and the uses proposed for the new digital object. The choice of an appropriate file format is one of the key decisions for a digitisation project.
Different original media types will require different conversion techniques as well as different file storage formats. This is an area that is evolving, as both conversion techniques improve (better scanners and digital cameras) and as new file formats develop. http://www.columbia.edu/acis/dl/imagespec.html#Quick_Guide
There are many different digital file formats even for a single original media type. Some file formats are referred to as 'open' standards. This means that the technical specifications for the format have been developed by a group of experts and agreed for usage across the computer industry. The technical specifications of these standards are openly available and this allows software companies to develop their products to handle items in these formats. Examples of 'open' standards include: JPEG (for images) and MPEG (for video and audio). Other file formats have been developed by individual companies and are therefore referred to as 'proprietary' standards. In some cases the companies will release the technical specifications and encourage wide use of the format, making the proprietary format a 'de facto' open standard. An example of a 'proprietary' standard is the Kodak 'ImagePac' PhotoCD. Some examples of 'proprietary' standards which have become 'de facto' standards include the TIFF and GIF image formats and the Adobe Acrobat PDF format for viewing text files. If a proprietary format is chosen, it should be remembered that future hardware or software systems may not be able to access or utilise these files. This is also possible with open standards, as they also change, but a migration path is arguably more likely to be available for open standards.
Use of file formats which
have been well documented, have undergone thorough testing and are
non-proprietary and usable on different hardware and software platforms
minimises the frequency of future migration, improves sustainability of the
resource, and reduces the risk and costs in their future maintenance.
To date digitisation projects in cultural organisations have generally focused on images, often photographs, manuscripts or artworks. Therefore there is a large amount of information available regarding image formats and appropriate resolution for this media type. In most cases these digital files are created using scanners or digital cameras. However it is also possible to digitise from microfilm.
Some image formats compress the digital file in order to reduce its size and this may result in a loss of information. For longer-term storage of digital images therefore, it is best to use uncompressed file formats, or those employing 'lossless' compression.
For digital images, it is important to establish the level of quality required by the user. It will therefore be necessary to make decisions not only about the file format chosen, but also about the quality levels applied within the chosen format when the item is digitised. Two main elements affect the image quality in most of the common digital image file formats: tonality (or bit-depth) and resolution (or dots per inch).
The bit depth of an image describes how many digital bits are used to colour each pixel. Pixels are the picture elements (or small dots) which make up images. The more bits used per pixel, the more different shades or colours are available in the image. This increases both the image quality and the resultant file size.
This refers to the number of dots or pixels used per inch when undertaking the digital capture of each item. It is expressed in dpi (or dots per inch). In general the higher number of dots per inch an item is digitised at, the higher the quality of the resulting image. If it is necessary to view a high level of detail in the image, you will need to capture it at a higher dpi. Again, the higher the resolution, the larger the file size.
When digitising from microfilm, it must be remembered that the item being digitised from is much smaller than the original, so it is necessary to digitise at a very high dpi to ensure the appropriate quality in the final digital object.
The most common digital image file formats currently being used by cultural organisations are as follows:
** for further information see http://www.library.cornell.edu/preservation/tutorial/presentation/table7-1.html
There are other formats available, plus different formats for non-image media, and many of these may be more appropriate for specific applications.
The creation of digital images requires both hardware and software. The hardware uses:
light-sensitive material on a silicon chip to detect photons (the light emanating or reflecting from the source item), which are recorded electronically in the picture elements or pixels (Lee, 49)
In most cases the hardware used to create digital images will consist of a scanner or a digital camera. There are a number of types of scanners available. The most common are flatbed scanners, some of which have sheet feeders (or ADFs) attached. Most flatbed scanners only allow A4 documents or smaller to be scanned. However A3 scanners are available. There are also drum scanners which offer extremely high resolution scanning, slide scanners and microfilm scanners for specific applications. Most scanners come with their own scanning software, however additional software may be required for further image manipulation or to produce the required file format.
A range of digital cameras are now available, from domestic models which may provide limited resolution and format options, to extremely high quality professional equipment. High quality cameras are often mounted on stands so that they can be positioned at the correct angle to the source item. In addition they may need to be mounted above a cradle or similar device for the original material to rest on during the copying process. (Lee, 56)
A scanner operates much like a photocopier that produces a digital file. It uses a CCD (Charged Couple Device) to digitise the document. The difference between many scanners is the quality of the image produced by the CCD.
The best type of scanner to consider is a flatbed scanner. This is the most popular image capture device. They usually start at around $150 for an A4 model. It is possible to get a scanner with a transparency adaptor to allow you to scan negatives or slides. The more expensive models allow you to scan negs & slides larger than 35mm which can be an advantage as a lot of older formats were larger than 35mm.
When considering a scanner
try to get the largest possible (A3) with a transparency adaptor & large
resolution that isn't interpolated. (Interpolation is the term used when the
scanner increases the dpi through resampling.)
It is best to ask what the original dpi is of the scanner before purchase. The minimum resolution required - 600dpi is good for scanning for the web - 1200dpi is the minimum required for negative/transparency scanning, with 24 - 30 bit colour depth.
When you scan you should have made the decision of the intended purpose of the scan. Then you should scan at the maximum size required. For example if the final file is to be stored for printing on a standard inkjet printer it would be best to scan it at 300dpi and save this file as a TIFF. This is now your master scan. This can be saved onto a cd rom or on a hard drive.
Using software such as photoshop or photoshop elements you can then resize the image for web output at 72dpi, with pixel dimensions at 600 x 480 (a large image in a browser window) or 150 x 110 (a thumbnail image).
Picture Australia has set requirements for thumbnails. Thumbnails should be 150 pixels in their longest dimension (either width or height). The other dimension should be less than or equal to 150 pixels and set to whatever is appropriate to maintain the aspect ratio of the image. The following pixel dimensions are all valid:
150 x 110 Landscape
Note the smaller dimension (110 in these examples) will vary depending on the shape (aspect ratio) of the original item. It is possible to automate the generation of thumbnails from medium or high resolution versions of images.
A short rule of thumb for master scans
The following diagram shows how a digital master file can be resampled at different resolutions as required for use in different types of output media.
In order to ensure consistent quality of output during the digitisation process, prior benchmarking should be undertaken to determine the standards which will need to be applied to obtain the desired output quality for all items being digitised. This involves a process of evaluating the requirements for the image output and documenting these in technical terms. (for further information see Kenney & Rieger, 24 and Lee, 83)
Once images have been digitised, a process of post-digitisation quality evaluation should be put in place to check the output of the process for consistent quality and adherence to the agreed benchmark standards.
Non-image source material may require different file formats and quality standards.
When digitising items consisting predominantly of text it will be necessary to carefully consider the final use planned for the digital files. Depending on the proposed usage, it may be necessary to use OCR (optical character recognition) software in order to allow searching or manipulation of the text. Currently however, OCR software cannot guarantee perfectly correct results and it will be necessary to undertake further manual proof reading if 100% accuracy is required. Estimates of the cost of scanning using OCR software, PLUS the cost of manual checking and correction, range up to or even above the cost of manual keying into word processing or database software.
If the information to be copied and made accessible is in some format which is amenable to incorporation into a database, then there is a clear cost and functionality benefit into manually keying the information from the original into a database. Resources such as street directories (e.g. Sands and MacDougall) or rate books may be best treated this way - digitising is perhaps a red herring.
There are a wide range of potential digital audio and video formats available, and as with images, the higher the bit-depth used the better the quality. However resulting file sizes can be extremely large, and compression technologies may reduce the quality of the output. The determining factor when establishing the most appropriate file format and standards to apply will be how it is intended that the audio or video file will be used.
It is possible to stream audio and video over the Internet, however most formats are proprietary. Common proprietary formats for streaming audio and video include RealNetwork's RealPlayer http://www.real.com/, Windows Media Player http://www.microsoft.com/windows/windowsmedia/en/wm7/encoder.asp and Quicktime for Windows http://www.apple.com/quicktime/productsBack to the Manual Home Page Back to the Local History Digitization Page