3
30(b)(6)
Under Federal Rule of Civil Procedure 30(b)(6), a corporation, partnership, association, or governmental agency is subject to the deposition process and is required to provide one or more witnesses to “testify as to matters known or reasonably available to the organization” without compromising attorney-client privilege communications or work product.
A
Ablate
Describes the process by which laser-readable “pits” are burned into the recorded layer of optical discs, DVD-ROMs, and CD-ROMs.
Ablative
Unalterable data. See Ablate.
Abuse of Privilege
Formal nomenclature for user actions not in accordance with organizational policy or law. Actions falling outside, or explicitly proscribed by, acceptable use policy.
Accountability
The principle that individuals using a facility or a computer system must be identifiable. With accountability, violations or attempted violations of system security can be traced to individuals who can then be held responsible.
Accuracy
Department of Defense parlance for the notion that information has been maintained and transferred in such a way as to be inviolate (the information has been protected from being modified or corrupted either maliciously or accidentally). Accuracy protects against forgery or tampering.
Acetate-base film
A safety film (ANSI Standard) substrate used to produce microfilm.
ACL
Access Control List. A security type used by Lotus Notes developers to grant varying levels of access and user privileges within Lotus Notes databases.
Active Data
Information residing on the direct access storage media of computer systems, which is readily visible to the operating system and/or application software with which it was created. Immediately accessible to users without undeletion, modification, or reconstruction. E.g., word processing and spreadsheet files, programs and files used by the computer’s operating system.
Active Records
Active Records are those Records related to current, ongoing or in-process activities and are referred to on a regular basis to respond to day-to-day operational requirements. An active record resides in native application format and is accessible for purposes of business processing with no restrictions on alteration beyond normal business rules. See Inactive Records.
ADC
Analog to Digital Converter. Converts analog data to a digital format.
Address
Addresses using a number of different protocols are commonly used on the Internet. These addresses include email addresses (Simple Mail Transfer Protocol or SMTP), IP (Internet Protocol) addresses, and URLs (Uniform Resource Locators; a.k.a. Web addresses).
ADF
Automatic Document Feeder. This is the means by which a scanner feeds the paper document.
AI
See Artificial Intelligence.
AIIM
The Association for Information and Image Management (www1.aiim.com). Focuses on electronic imaging.
Algorithm
A detailed formula or set of steps used to solve a particular problem. To be an algorithm, a set of rules must be unambiguous and have a clear stopping point.
Aliasing
When computer graphics output has jagged edges or a stair-stepped (rather than a smooth) appearance when magnified. The graphics output can be smoothed using anti-aliasing algorithms.
Alphanumeric
Characters composed of letters, numbers, and sometimes punctuation marks. Excludes control characters.
Ambient Data
See Latent Data.
Analog
Data in analog format is represented by continuously variable, measurable, physical quantities such as voltage, amplitude or frequency. Analog is the opposite of digital.
Annotations
The changes, additions, or editorial comments made or applicable to a document—usually an electronic image file—using electronic sticky notes, highlighter, or other electronic tools. Annotations should be overlaid and not change the original document.
ANSI
American National Standards Institute. A private, non-profit organization that administers and coordinates the U.S. voluntary standardization and conformity assessment system.
ANSI
See American National Standards Institute.
Aperture Card
An IBM punch card with a window that holds a 35mm frame of microfilm. Indexing information is punched in the card.
Application
A collection of one or more related software programs that enables a user to enter, store, view, modify, or extract information from files or databases. The term is commonly used in place of “program” or “software.” Applications may include word processors, Internet browsing tools, and spreadsheets.
Architecture
The hardware, software, or combination of both that comprises a computer system or network. The term “open architecture” is used to describe computer and network components that are more readily interconnected and interoperable. Conversely, the term “closed architecture” describes components that are less readily interconnected and interoperable.
Archival Data
Information an organization maintains for long-term storage and record-keeping purposes, not immediately accessible to the user of a computer system. Archival data may be written to removable media such as a CD, magneto-optical media, tape, or other electronic storage device, or may be maintained on system hard drives. Some systems allow users to retrieve archival data directly while other systems require the intervention of an IT professional.
Archive
See Archival Data. Also: After processing discovery materials, an archive is created for each case. Viruses found in processing are typically removed (a clean archive), program-related files are removed (per instruction, a purged archive), erased files are analyzed and recovered if possible, slack space is checked, files are grouped according to files classes, and metadata is added.
ARMA International
A not-for-profit association and recognized authority on managing records and information, both paper and electronic. (www.arma.org)
Artificial Intelligence (AI)
The subfield of computer science concerned with the concepts and methods of symbolic inference by computer and symbolic knowledge representation for use in making inferences; an attempt to model aspects of human thought on computers. It is also sometimes defined as trying to solve by computer any problem once believed to be solvable only by humans. AI is the capability of a device to perform functions that are normally associated with human intelligence, such as reasoning and optimization through experience.
ASCII
An acronym for American Standard Code that allocates a number to each key on the keyboard that can be traded and read by most computer systems.
Aspect Ratio
The relationship of the height and width of any image. The aspect ratio of an image must be maintained to prevent distortion.
Attachment
A record or file associated with another record for the purpose of storage or transfer. There may be multiple attachments associated with a single “parent” or “master” record. The attachments and associated record may be managed and processed as a single unit. In common use, this term refers to a file (or files) associated with an email for retention and storage as a single message unit.
Attribute
An attribute is a characteristic of data that sets it apart from other data, such as location, length, or type. Sometimes used synonymously with “data element” or “property.”
Audit Trail
In computer security systems, a chronological record of when users logged in, how long they were engaged in various activities, what they were doing, and whether any actual or attempted security violations occurred. An audit trail is an automated or manual set of chronological records of system activities that may enable the reconstruction and examination of a sequence of events and/or changes in an event.
Author
The author of a document is the person, office, or designated position responsible for its creation or issuance. In the case of a document in the form of a letter, the author or originator is usually indicated on the letterhead or by signature. In some cases, the software application producing the document may capture the author’s identity and associate it with the document. For records management purposes, the author or originator may be designated as a person, official title, office symbol, or code. Synonymous with "originator."
AVI
Audio-Video Interleave. A Microsoft standard for Windows animation files that interleaves audio and video to provide medium-quality multimedia.
B
Backbone
The top level of a hierarchical network. It is the main channel along which data is transferred.
Backfiles
Existing paper or microfilm files.
Backup
To create a copy of data as a precaution against the loss or damage of the original data. Most users backup some of their files, and many computer networks utilize automatic backup software to make regular copies of some or all of the data on the network.
Backup Data
Information that is not presently in use by an organization and is routinely stored separately upon portable media, to free up space and permit data recovery in the event of disaster.
Backup Tape
Magnetic tape used to store copies of data, for use when restoration or recovery of data is required. Data on backup tapes is generally recorded and stored sequentially, rather than randomly, in order to locate and access a specific file or data set.
Backup Tape Recycling
Backup Tape Recycling describes the process whereby an organization’s backup tapes are overwritten with new data, usually on a fixed schedule determined jointly by records management, legal, and IT sources. For example, the use of nightly backup tapes for each day of the week with the daily backup tape for a particular day being overwritten on the same day the following week; weekly and monthly backups being stored offsite for a specific period of time before being placed back in the rotation.
Bandwidth
The amount of information or data that can be sent over a network connection in a given period of time. Usually stated in kilobits per second (kbps) or megabits per second (mps).
Bar Code
A small pattern of vertical lines that can be read by a laser or an optical scanner. In records management and electronic discovery, bar codes are often affixed to specific records for indexing, tracking, and retrieval purposes.
Batch Processing
The processing of a large amount of data, or multiple records, in a single step.
Bates Number
Sequential numbering used to track documents and images in production data sets, where each page is identified by a unique production number. Often used in conjunction with a suffix or prefix to identify the producing party, the litigation, or other relevant information. See also Bates Production Number.
Bates Production Number
A tracking number assigned to each page of each document in the production set.
Baud Rate
The number of times per second a communications channel changes the carrier signal it sends on a phone line. A 2400-baud modem changes the signal 2400 times a second.
BBS
Bulletin Board System. A computer system or service that users access to participate in electronic discussion groups, post messages, and/or download files.
BCS
Boston Computer Society. One of the first associations of PC/Apple users as well as one of the largest and most active.
Beginning Document Number
The Bates Number identifying the first page of a document or record. Also known as BegDoc#.
Bibliographical/Objective Coding
Extracting objective information from electronic documents such as date created and author/recipient/copies, after which it is associated with a specific electronic document.
Binary
Mathematical base 2, or numbers composed of a series of zeros and ones. Since zeros and ones can be easily represented by two voltage levels on an electronic device, the binary number system is widely used in digital computing.
BIOS
Basic Input Output System. The set of user-independent computer instructions stored in a computer’s ROM, immediately available to the computer when the computer is turned on. BIOS information provides the code necessary to control the keyboard, display screen, disc drives, and communication ports, in addition to handling certain miscellaneous functions.
Bit
Binary digIT. The smallest unit of computer data, consisting of either 0 or 1. There are eight bits in a byte.
Bit Map
Provides information on the placement and color of individual bits and allows the creation of characters or images by creating a picture composed of individual bits (pixels).
Bit Stream Backup
The backup of all areas of a computer hard disk drive or another type of storage media, e.g., Zip discs, floppy discs, Jaz discs. Such backups exactly replicate all sectors on a given storage device, therefore, all files and ambient data storage areas are copied. Also referred to as mirror image backups.
Bi-tonal
Black-and-white-only image.
Blog
Frequent, chronological Web publications consisting of links and postings. The most recent posting appears at the top of the page. Also referred to as Web logs.
BMP
A Windows file format for storing bit map images.
Bookmark
A link to a website or page previously visited.
Boolean Search
The term 'Boolean' refers to a system of logic developed by an early computer pioneer, George Boole. In Boolean searching, an 'and' operator between two words results in a search for documents containing both of the words. An 'or' operator between two words creates a search for documents containing either of the target words. A 'not' operator between two words creates a search result containing the first word but excluding the second.
Boot
To start up or reset a computer.
Boot Sector
The very first sector on a hard drive which contains the computer code (boot strap loader) necessary for the computer to start up and the partition table describing the organization of the hard drive.
BPI
Bits Per Inch. Measures data densities in disc and magnetic tape systems.
BPS
Bits Per Second.
Broadband
Communications of high capacity and usually of multimedia content.
Browser
An application, such as Internet Explorer or Netscape Navigator, used to view and navigate the World Wide Web and other Internet resources.
Bug
A problem with computer software or hardware that causes it to malfunction or crash.
Burn
Slang for making (burning) a CD-ROM copy of data, whether it is music, software, or other data.
Bus
A parallel circuit that connects the major components of a computer, allowing the transfer of electric impulses from one connected component to any other.
Business-process Outsourcing
Occurs when an organization turns over the management of a business function, such as accounts payable, purchasing, payroll, or information technology to a third party.
Byte
A unit of measure consisting of eight bits. The basic measurement of most computer data as multiples of the byte value. One million bytes are equivalent to a 'megabyte' while one billion bytes is a 'gigabyte'. 1 gigabyte = 1,000 megabytes. 1 terabyte = 1,000 gigabytes.
C
Cache
Pronounced cash, the cache is a special high-speed storage mechanism that can be either a reserved section of main memory or an independent high-speed storage device, with two types of caching commonly used in personal computers: memory caching and disk caching.
A memory cache, sometimes called a cache store or RAM cache, is a portion of memory made of high-speed static RAM (SRAM) instead of the slower and cheaper dynamic RAM (DRAM) used for main memory. Memory caching is effective because most programs access the same data or instructions over and over, and by keeping as much of this information as possible in SRAM, the computer avoids accessing the slower DRAM.
Disc caching works under the same principle as memory caching, but instead of using high-speed SRAM, a disc cache uses conventional main memory. The most recently accessed data from the disc (as well as adjacent sectors) is stored in a memory buffer. When a program needs to access data from the disc, it first checks the disc cache to see if the data is there. Disc caching can dramatically improve the performance of applications, because accessing a byte of data in RAM can be thousands of times faster than accessing a byte on a hard disc.
When data is found in the cache, it is called a cache hit, and the effectiveness of a cache is judged by its hit rate. Many cache systems use a technique known as smart caching, in which the system can recognize certain types of frequently used data. The strategies for determining which information should be kept in the cache constitute some of the more interesting problems in computer science.
Caching
The temporary storage of frequently used data to speed access. See also Cache.
Case De-duplication
Retains only single copies of documents per case. For example, if an identical document resides with Mr. A, Mr. B, and Mr. C, only the first occurrence of the file will be saved (Mr. A's). Contrast with custodian de-duplication and production de-duplication.
Catalog
See Index.
CCD
Charged Coupled Device. A computer chip, the output of which correlates with the light or color passed by it. Individual CCDs or arrays of these are used in scanners as a high-resolution digital camera to read documents.
CCITT
Consultative Committee for International Telephone & Telegraphy. Sets standards for phones, faxes, modems etc. The standard exists primarily for fax documents.
CCITT Group 4
A lossless compression technique or format that reduces the size of a file, generally about 5:1 over RLE and 40:1 over bitmap. CCITT Group 4 compression may only be used for bi-tonal images.
CD
Compact Disc. A type of optical disc storage media that comes in a variety of formats, including CD-ROM (“CD Read-Only Memory”) that are read-only; CD-R (“CD Recordable”) that are write to once and are then read-only; and CD-RW (“CD Re-Writable”) that can be written to multiple times.
CDPD
Cellular Digital Packet Data. A data communication standard utilizing the unused capacity of cellular voice providers to transfer data.
CD-R
Compact Disc Recordable. A CD-ROM on which a user may permanently record data once using a CD Burner.
CD-ROM
Data storage medium that uses compact discs to store about 1,500-floppy-discs-worth of data. See also Compact Disc.
Centronics Interface
A parallel interface standard for connecting printers and other devices to computers.
Certificate
Digital signature combining data verification and encryption key. See also PKI Digital Signature.
CGA
Color Graphics Adapter. See VGA (Video Graphics Adapter).
Chaff
Advanced encryption technique involving data dispersal and mixing. Synonymous with "winnowing."
Chain of Custody
A process used to maintain and document the chronological history of electronic evidence. A chain of custody ensures that the data presented is 'as originally acquired' and has not been altered prior to admission into evidence. An electronic chain of custody link should be maintained between all electronic data and its original physical media throughout the production process.
Chain of Evidence
The sequencing of the chain of evidence follows this order:
1. Collection & Identification
2. Analysis
3. Storage
4. Preservation
5. Transportation
6. Presentation in Court
7. Return to Owner
The chain of evidence shows:
1. Who obtained the evidence
2. Where and when the evidence was obtained
3. Who secured the evidence
4. Who had control or possession of the evidence
Character Treatment
The use of all caps or another standard form of treating letters in a coding project.
Chat
A form of real-time communication between two or more people based on typed text. The text is conveyed via computers connected over a network such as the Internet. See also Instant messaging (IM).
Check Digit
One digit, usually the last, of an identifying field is a mathematical function of all of the other digits in the field. This value can be calculated from the other digits in the field and compared with the check digit to verify the validity of the whole field.
CIE
Commission International de l'Eclairage. The international commission on color matching and illumination systems.
Cine-Mode
Data recorded on a film strip such that it can be read by a human when held vertically.
Cinepak
A compression algorithm. See also MPEG.
CITIS
Contractor Integrated Technical Information Service. The Department Of Defense now requires contractors to have an integrated electronic document image and management system.
Client
Any computer system that requests a service of another computer system. A workstation requesting the contents of a file from a file server is its client. See also Thin Client.
Client-Server Architecture
An architecture whereby a computer system consists of one or more server computers and numerous client computers (workstations). The system is functionally distributed across several nodes on a network and is typified by a high degree of parallel processing across distributed nodes. With client-server architecture, CPU-intensive processes (such as searching and indexing) are completed on the server, while image viewing and OCR occur on the client. This dramatically reduces network data traffic and insulates the database from workstation interruptions.
Clipboard
A holding area that temporarily stores information copied or cut from a document.
Cluster
Clusters are fixed-length blocks of bytes that store data for Microsoft operating systems. Clusters are, essentially, a consortium of sectors used to allocate the data storage area in all Microsoft operating systems, range in size from one sector to 128 sectors, and vary based on the size of the logical storage volume and the operating system involved.
Cluster (File)
The smallest unit of storage space that can be allocated to store a file on operating systems that use a file allocation table (FAT) architecture. Windows and DOS organize hard discs based on Clusters (also known as allocation units), which consist of one or more contiguous sectors. Discs using smaller Cluster sizes waste less space and store information more efficiently.
Cluster (System)
A collection of individual computers that appear as a single logical unit. Also referred to as matrix or grid systems.
Cluster bitmaps
Used in NTFS to keep track of the status (free or used) of clusters on the hard drive.
CMYK
Cyan, Magenta, Yellow, and Black. A subtractive method used in four color printing and Desktop Publishing.
Coding
Litigation Support: Automated or human process through which documents are examined and evaluated using predetermined codes, and the results of those comparisons are logged. Coding usually identifies names, dates, and relevant terms or phrases. Coding may be structured (limited to the selection of one of a finite number of choices) or unstructured (a narrative comment about a document). Coding may be objective (e.g., the name of the sender or the date) or subjective (e.g., evaluation as to the relevancy or probative value of documents.)
Medical: Assigning 'codes' to medical records to determine reimbursement.
Programming: Writing in a programming language to create a custom, semi-custom, or add-on application.
COLD
Computer Output to Laser Disc. A computer programming process that outputs electronic records and printed reports to laser disc instead of a printer.
COM
Computer Output to Microfilm. A process that outputs electronic records and computer generated reports to microfilm.
Comb
A series of boxes with their top missing. Tick marks guide text entry. Used in forms processing rather than boxes.
Comic Mode
Human-readable data, recorded on a strip of film which can be read when the film is moved horizontally to the reader.
Compliance Search
The identification of relevant terms and/or parties in response to a discovery request.
Component Video
Separates video into luminosity and color signals that provide the highest possible signal quality.
Composite Video
Combines red, green, and blue synchronization signals into one video signal so that only one connector is required; used by most TVs and VCRs.
Compression
Compression algorithms such as Zip and RLE reduce the size of files saving both storage space and reducing bandwidth required for access and transmission. Data compression is widely used in backup utilities, spreadsheet applications, and database management systems. Compression generally eliminates redundant information and/or predicts where changes will occur. “Lossless” compression techniques such as Zip and RLE preserve the integrity of the input. Coding standards such as JPEG and MPEG employ “lossy” methods which do not preserve all of the original information and are most commonly used for photographs, audio, and video.
Compression Ratio
The ratio of the size of an uncompressed file to a compressed file, e.g., with a 10:1 compression ratio, a 1 MB file can be compressed to 100 KB.
Computer
Includes but is not limited to network servers, desktops, laptops, notebook computers, mainframes, and PDAs (personal digital assistants).
Computer Evidence
Computer evidence is rather unique when compared to other forms of more traditional documentary evidence. Unlike paper documentation, computer evidence is extremely fragile and it occurs in the form of an identical copy of a specific document that is stored in a computer file. In addition, the legal "best evidence" rules differ for the processing of computer evidence. However, there is the potential for unauthorized copies to be made of important computer files without leaving behind a trace that the copy was made.
Computer evidence is not limited to data stored in computer files, rather most relevant computer evidence is uncovered in uncommonly known locations. For example, on Microsoft Windows and Windows NT-based computer systems, large quantities of evidence can be found in the Windows swap files or Page Files. In addition, computer evidence can also be uncovered in file slack and unallocated file space.
Computer Forensics
The use of specialized techniques for recovery, authentication, and analysis of electronic data when an investigation or litigation involves issues relating to reconstruction of computer usage, examination of residual data, authentication of data by technical analysis, or explanation of technical features of data and computer usage. Computer forensics requires specialized expertise that goes beyond normal data collection and preservation techniques available to end-users or system support personnel, and generally requires strict adherence to chain-of-custody protocols. See also Forensics and Forensic Copy.
Computer Investigations
Computer crimes are specifically defined by federal and/or state statutes and any computer documentary evidence utilized during a computer investigation may include computer data stored on floppy diskettes, zip discs, CDs, and computer hard disc drives. The evidence necessary to prove computer-related crimes can potentially be located on one or more computer hard disc drives in various geographic locations. This evidence can reside on computer storage media as bytes of data in the form of computer files and ambient data; however, ambient data is usually unknown to most computer users and is therefore often very useful to computer forensics investigators.
Computer investigations rely upon evidence stored as data as well as the timeline information for when files were created, modified, and/or last accessed by a computer user. Timelines of activities can be essential when multiple computers and individuals are involved in the commission of a crime. In addition, computer investigations generally involve the review of Internet log files to determine Internet account abuses and analysis of the Windows swap file. Using computer forensics procedures, processes, and tools, computer forensics investigators can identify passwords, network logons, Internet activity, and fragments of email messages that were dumped from computer memory during past Windows work sessions.
Concept Search
Searching electronic documents to determine relevance by analyzing the words and putting search requests in conceptual groupings so the true meaning of the request is considered. Concept searching considers both the word and the context in which it appears to differentiate between concepts such as diamond (baseball) and diamond (jewelry).
Content Comparison
A method of de-duplication that compares file content or output (to image or paper) and ignores metadata. See also De-duplication.
Contextual Search
The process of returning electronic evidence to its true context when created, by whom, for what purpose, etc.
Continuous Tone
An image (e.g., a photograph) which has all the values of gray from white to black.
Convergence
Integration of computing, communications, and broadcasting systems.
Cookie
A message given to a Web browser by a Web server. The browser stores the message holding information on the times and dates a user has visited a website in a text file. Other information can also be saved to your hard drive in these text files, including information about online purchases, validation information about the user for 'Members Only' websites, etc. The message is then sent back to the server each time the browser requests a page from the server. The main purpose of cookies is to identify users and possibly prepare customized Web pages for them.
Corrupted File
A file damaged in some way, such as by a virus, or by software or hardware failure, so that it cannot be read by a computer.
COTS
Commercial Off-the-Shelf. Hardware or software products that are commercially manufactured, ready-made, and available for use by the general public without the need for customization.
CPI
Characters Per Inch.
CPU
Central Processing Unit. The primary silicon chip that runs a computer’s operating system and application software. It performs a computer’s essential mathematical functions and controls essential operations.
CRC
Cyclical Redundancy Checking. Used in data communications to create a checksum character at the end of a data block to ensure integrity of data transmission and receipt.
CRM
Customer Relationship Management. Programs that help manage clients and contacts, normally used in larger companies. Often a significant repository of sales, customer, and sometimes marketing data.
Cross-custodian De-duplication
Culls a document to the extent multiple copies of that document reside within different custodians’ data sets. See also De-duplication.
CRT
Cathode Ray Tube. The picture tube of a computer monitor or television.
Cryptography
Technique to scramble data to preserve confidentiality or authenticity.
CSV
Comma Separated Value. A record layout that separates data fields or values with a comma and typically encloses data in quotation marks.
Culling
To remove a document from the collection to be produced or reviewed. See also Filtering and Harvesting.
Custodian
A records custodian is an individual responsible for the physical storage and protection of records throughout their retention period. In the context of electronic records, custodianship may not be a direct part of the records management function in all organizations. For example, some organizations may place this responsibility within their Information Technology Department, or they may assign responsibility for retaining and preserving records with individual employees.
Custodian De-duplication
Culls a document if multiple copies of that document reside within the same custodian's data set. For example, if Mr. A and Mr. B each have a copy of a specific document, and Mr. C has two copies, the system will maintain one copy each for Mr. A, Mr. B, and Mr. C, only the first occurrence of the file will be saved (Mr. A's).
Customer-Added Metadata
Data (any information stored on a computer) or work product created by a user while reviewing a document. For example, annotation text of a document or subjective coding information. Also see User-Added Metadata. Contrast with Vendor-added Metadata.
Cyber
Slang for information shared on the internet. See also Internet.
Cyberspace
See Cyber.
Cylinder
The set of tracks on both sides of each platter in the hard drive that is located at the same head position.
D
DAC
Digital Analog Converter. Converts digital data to analog data.
DAD
Digital Audio Disc. Another term for Compact Disc.
DAT
Digital Audio Tape. A magnetic tape generally used to record audio but can hold up to 40 gigabytes (or 60 CDs) of data if used for data storage. Has the disadvantage of being a serial access device. Often used for backup.
Data
A representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by humans or by automatic means. Any representations such as characters or analog quantities to which meaning is, or might be, assigned.
Data Analysis
Provides access to tools allowing users to perform sophisticated data analysis of both native data content and metadata. Features include:
• Basic keyword and Boolean search functionality
• Natural language and search query support
• Fuzzy logic and thesaurus-based search
• Advanced data mining capabilities, such as artificial intelligence, neural-network, and thematic data mapping search
Data Collection
See Harvesting.
Data Element
A combination of characters or bytes referring to one separate piece of information, such as name, address, or age.
Data Extraction
The process of retrieving data from documents (hard copy or electronic). The process may be manual or electronic.
Data Field
See Field.
Data Filtering
See Filtering.
Data Formats
The organization of information for display, storage, or printing. Data is maintained in certain common formats so that it can be used by various programs, which may only work with data in a particular format, e.g., PDF, html.
Data Harvesting
See Harvesting.
Data Integrity
Refers to the validity of data. Data integrity can be comprised in a number of ways, including:
• Human errors when data is entered
• Errors that occur when data is transmitted from one computer to another
• Software bugs or viruses
• Hardware malfunctions, such as disc crashes
• Natural disasters, such as fires and floods
There are many ways to minimize these threats to data, including:
• Backing up data on a regular basis
• Controlling access to data via security mechanisms
• Designing user interfaces that prevent the input of invalid data
• Using error detection and correction software when transmitting data
Data Mapping
Going beyond basic search capabilities, data mapping is also called keyless searching. It finds or suggests associations between files within a large body of data, which may not be apparent using other techniques.
Data Mining
Generally refers to techniques for extracting summaries and reports from an organization’s databases and data sets. In the context of electronic discovery, often refers to the processes used to cull through a collection of electronic data to extract evidence for production or presentation in an investigation or in litigation.
Data Set
A named or defined collection of data. See also Production Data Set and Privilege Data Set.
Data Streams
Microsoft introduced a data storage concept called data streams in Windows NT version 3.51. These data streams allow multiple forms of data to be associated with a file, including any number of graphic files, databases, programs, spreadsheets, word processing documents, or other data types associated with a given file to alter some of the rules concerning computer security issues and computer forensics investigations.
Data Verification
Assessment of data to ensure it has not been modified. The most common method of verification is hash coding by some method such as MD5. See also Digital Fingerprint and File Level.
Database
In electronic records a database is a set of data elements consisting of at least one file, or of a group of integrated files, usually stored in one location and made available to several users. In computing databases are sometimes classified according to their organizational approach with the most prevalent approach being the relational database—a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. Another popular organizational structure is the distributed database which can be dispersed or replicated among different points in a network. Computer databases typically contain aggregations of data records or files, such as sales transactions, product catalogs and inventories, and customer profiles. SQL (Structured Query Language) is a standard computer language for making interactive queries from and updates to a database.
Daubert (challenge)
Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993), addresses the admission of scientific expert testimony to ensure that the testimony is reliable before considered for admission pursuant to Rule 702. The court assesses the testimony by analyzing the methodology and applicability of the expert’s approach. Faced with a proffer of expert scientific testimony, the trial judge must determine first, pursuant to Rule 104(a), whether the expert is proposing to testify to (1) scientific knowledge that (2) will assist the trier of fact to understand or determine a fact at issue. This involves preliminary assessment of whether the reasoning or methodology is scientifically valid and whether it can be applied to the facts at issue. Daubert suggests an open approach and provides a list of four potential factors:
(1) whether the theory can be or has been tested
(2) whether the theory has been subjected to peer review or publication
(3) known or potential rate of error of that particular technique and the existence and maintenance of standards controlling the technique’s operation
(4) consideration of general acceptance within the scientific community. 509 U.S. at 593-94
DBMS
Database Management System. A software system used to access and retrieve data stored in a database.
Decryption
Transformation of encrypted (or scrambled) data back to original form.
De-duplication
The process of providing one instance of an item when there was once two or more identical copies. This process usually involves landing all files into a database and then searching for duplicate files. Basic de-duplication is performed on a select and limited basis (i.e. for file names and types) and is usually based on the value of the entire electronic document. Also known as "de-duping."
De-fragment
Use of a computer utility to reorganize files so they are more contiguous on a hard drive or other storage medium, if the files or parts thereof have become fragmented and scattered in various locations within the storage medium in the course of normal computer operations. Used to optimize the operation of the computer, it will overwrite information in unallocated space. Also known as "de-frag." See Fragmented.
Deleted Data
Data that existed on the computer as live data and which has been deleted by the computer system or end-user activity. Deleted data may remain on storage media in whole or in part until it is overwritten or “wiped.” Even after the data itself has been wiped, directory entries, pointers, or other information relating to the deleted data may remain on the computer. “Soft deletions” are data marked as deleted (and not generally available to the end-user after such marking), but not yet physically removed or overwritten. Soft-deleted data can be restored with complete integrity.
Deletion
Deletion is the process whereby data is removed from active files and other data storage structures on computers and rendered inaccessible except through the use of special data recovery tools. Deletion occurs on several levels in modern computer systems:
• File level deletion renders the file inaccessible to the operating system and normal application programs and marks the storage space occupied by the file’s directory entry and contents as free and available to re-use for data storage
• Record level deletion occurs when a record is rendered inaccessible to a database management system, or DBMS (usually marking the record storage space as available for reuse by the DBMS, although in some cases the space is never reused until the database is compacted) and is also characteristic of many email systems
• Byte level deletion occurs when text or other information is deleted from the file content (such as the deletion of text from a word processing file); such deletion may render the deleted data inaccessible to the application intended to be used in processing the file, but may not actually remove the data from the file’s content until a process such as compaction or rewriting of the file causes the deleted data to be overwritten.
Descenders
The portion of a character which falls below the main part of the letter (e.g., g, p, q)
De-shading
Removing shaded areas to render images more easily recognizable by OCR. De-shading software typically searches for areas with a regular pattern of tiny dots.
De-skewing
The process of straightening skewed (tilted) images. De-skewing is one of the image enhancements that can improve OCR accuracy. Documents often become skewed when scanned or faxed.
Desktop
Usually refers to an individual PC—a user's desktop computer (as opposed to a network computer or server). Also refers to the main view on a PC that shows the standard shortcuts to specific applications and/or files.
De-speckling
Removing isolated speckles from an image file. Speckles often develop when a document is scanned or faxed.
DIA/DCA
Document Interchange Architecture. An IBM standard for transmission and storage of voice, text or video over networks.
Digital
Information stored as a string of ones and zeros. Opposite of analog.
Digital Certificate
Electronic record that contains keys used to decrypt information, especially information sent over a public network like the Internet.
Digital Fingerprint
A fixed-length hash code that uniquely represents the binary content of a file. See also Data Verification, File Level Binary Comparison, and Hash Coding.
Digitize
The process of converting an analog value into a digital (numeric) representation.
Directory
A simulated file folder or container used to organize files and directories in a hierarchical or tree-like structure. UNIX and DOS use the term “directory," while Mac and Windows use the term “folder.”
Disaster Recovery Tape
See Backup Tape.
Disc
A round, flat, magnetic storage medium, either floppy or hard, on which data is digitally stored. A disc may also refer to a CD-ROM. Also spelled disk.
Disc Mirroring
A method of protecting data from a catastrophic hard disc failure or for long-term data storage. As each file is stored on the hard disc, a “mirror” copy is made on a second hard disc or on a different part of the same disc. See also Mirror.
Disc Partition
A hard drive containing a set of consecutive cylinders.
Discovery
Discovery is the process of identifying, locating, securing, and producing information and materials for the purpose of obtaining evidence for utilization in the legal process. The term is also used to describe the process of reviewing all materials which may be potentially relevant to the issues at hand and/or which may need to be disclosed to other parties, and of evaluating evidence to prove or disprove facts, theories, or allegations. There are several ways to conduct discovery, the most common of which are interrogatories, requests for production of documents, and depositions.
Discwipe
Utility that overwrites existing data. Various utilities exist with varying degrees of efficiency; some wipe only named files or unallocated space of residual data, thus unsophisticated users who try to wipe evidence may leave behind files of which they are unaware.
Disk
See Disc.
Disposition
The final business action carried out on a record. This action generally is to destroy or archive the record. Electronic record disposition can include “soft deletions” (see Deletion), “hard deletions,” “hard deletions with overwrites,” “archive to long-term store,” “forward to organization,” and “copy to another media or format and delete (hard or soft).”
Distributed Data
Data that resides on portable media and non-local devices such as laptop computers, home computers, CD-ROMs, floppy discs, zip drives, wireless communication devices, personal digital assistants, web pages, Internet repositories such as email hosted by Internet service providers or portals, and the like that belongs to the organization and not the user.
Dithering
In printing, dithering is usually called halftoning, and shades of gray are called halftones. The more dither patterns that a device or program supports, the more shades of gray it can represent. Dithering is the process of converting grays to different densities of black dots, usually for the purposes of printing or storing color or grayscale images as black and white images.
DLT
Digital Linear Tape. A type of backup tape which can hold up to 80 GB depending on the data file format.
Document
Fed. R. Civ. P. 34(a) defines a document as “including writings, drawings, graphs, charts, photographs, phonorecords, and other data compilations.” In the electronic discovery world, a document also refers to a collection of pages representing an electronic file. Emails, attachments, databases, word documents, spreadsheets, and graphic files are all examples of electronic documents.
Document Date
The original creation date of a document. For an email the document date is indicated by the date-stamp of the email.
Document Imaging Programs
Software used to store, manage, retrieve, and distribute documents quickly and easily on the computer.
Document Metadata
Data about the document stored in the document, as opposed to document content. Often this data is not immediately viewable in the software application used to create/edit the document but often can be accessed via a “Properties” view. Examples include document author, company, and create/revision dates. Contrast with File System Metadata and Email Metadata. See also Metadata.
Document Retention
The preservation of documents and data—including hard copy and electronic documents, databases, and emails—that are created, sent, and received in an organization’s ordinary course of business.
Document Retention Policy
A systematic plan for document retention in an organization's ordinary course of business.
Document Type
A typical field used in bibliographical coding. Typical doc type examples include letter, memo, report, and article.
Domain
A sub-network of servers and computers within a LAN. Domain information is useful when restoring backup tapes, particularly of email.
Domino Database
Another name for Lotus Notes Databases versions 5.0 or higher. See NSF.
Dot Pitch
Distance of one pixel in a CRT to the next pixel on the vertical plane. The smaller the number, the higher the display quality.
Double-sided Scanning
Double-sided scanning uses a single-sided scanner to scan double-sided pages, scanning one collated stack of paper, then flipping it over and scanning the other side. Differentiated from duplex scanners.
DPI
Dots Per Inch. The measurement of the resolution of display in printing systems. A typical CRT screen provides 96 dpi, which provides 9,216 dots per square inch (96x96). When a paper document is scanned, the resolution (or level of detail) at which the scanning was performed is expressed in DPI. Typically, documents are scanned at 200 or 300 dpi.
Draft Record
A preliminary version of a record before it has been completed, finalized, accepted, validated, or filed. Such records include working files and notes. Records and information management policies may provide for the destruction of draft records upon finalization, acceptance, validation, or filing of the final or official version of the record. However, draft records generally must be retained if (1) they are deemed to be subject to a legal hold, or (2) a specific law or regulation mandates their retention and policies should recognize such exceptions.
Drag-and-drop
The movement of on-screen objects by dragging them with the mouse and dropping them in another location
DRAM
Dynamic Random Access Memory. A memory technology that is periodically “refreshed” or updated—as opposed to “static” RAM chips which do not require refreshing. The term is often used to refer to the memory chips themselves.
Drive Geometry
A computer hard drive is made up of a number of rapidly rotating platters that have a set of read/write heads on both sides of each platter. Each platter is divided into a series of concentric rings called tracks. Each track is further divided into sections called sectors, and each sector is subdivided into bytes. Drive geometry refers to the number and positions of each of these structures.
Driver
A computer program that controls various devices such as the keyboard, mouse, monitor, etc.
DSP
Digital Signal Processor/Processing. A special-purpose computer (or technique) that digitally processes signals and electrical/analog waveforms.
DTP
Desktop Publishing. PC applications used to prepare direct print output or output suitable for printing presses.
Duplex
Two-sided page(s).
Duplex Scanners
Automatically scan both sides of a doublesided page, producing two images at once. Differentiated from double-sided scanning.
DVD
Digital Video Disc or Digital Versatile Disc. A plastic disc, like a CD, on which data can be written and read. DVDs are faster, can hold more information, and can support more data formats than CDs.
E
ECM
Enterprise Content Management.
EDI
Electronic Data Interchange. Eliminating forms altogether by encoding the data as close as possible to the point of the transaction; automated business information exchange.
EDM
Electronic Document Management. For paper documents, involves imaging, indexing/coding, and archiving of scanned documents/images, and thereafter electronically managing them during all life cycle phases. Electronic documents are likewise electronically managed from creation to archiving and all stages in between.
EDMS
Electronic Document Management System. A system to electronically manage documents during all life cycles. See EDM.
EGA
Extended Graphics Adapter. See VGA.
EIA
Electronic Industries Association.
EISA
Extended Industry Standard Architecture. One of the standard buses used for PCs.
Electronic Archive
See Archival Data.
Electronic Discovery
The process of collecting (also called “harvesting”), preparing, reviewing, and producing electronic documents in the context of the legal process These documents include email, Web pages, word processing files, computer databases, and virtually anything that is stored on a computer. Technically, documents and data are "electronic” if they exist in a medium that can be read only through the use of computers. Such media include cache memory, magnetic discs (such as computer hard drives or floppy discs), optical discs (such as DVDs or CDs), and magnetic tapes. Also referred to as e-discovery, ediscovery, EDD (electronic data discovery), and ED.
Electronic Evidence
According to Black's law dictionary, evidence is "any species of proof or probative matter legally presented at the trial of an issue by the act of parties and through the medium of witnesses, records, documents, exhibits, concrete objects, etc. for the purpose of inducing belief in the minds of the court or jury as their contention." Electronic information generally is admissible into evidence in a legal proceeding.
Electronic File Processing
Generally includes extraction of metadata from files, identification of duplicates/de-duplication, and rendering of data into delimited format.
Electronic Image
An electronic or digital picture of a document (e.g., TIFF, PDF.).
Electronic Image
An electronic or digital picture of a document; the most common image used in e-discovery is TIFF (Tagged Information File Format).
Electronic Image Management (EIM)
A term coined to indicate the creation, management, and structure of electronic images (TIFF, PDF, GIF, JPG, etc.) within an organization.
Electronic Mail (Email)
Commonly referred to as email, an electronic mail message is a document created or received via an electronic mail system, including brief notes, formal or substantive narrative documents, and any attachments, such as word processing and other electronic documents, which may be transmitted with the message.
Electronic Record
Information recorded in a form that requires a computer or other machine to process it and that otherwise satisfies the definition of a record.
Electrostatic Printing
Paper is exposed to electron charge. Toner sticks to the charged pixels.
Em
In any print, the font or size is equal to the width of the letter “M” in that font and size.
Email Address
An electronic mail address. Internet email addresses follow the formula user-ID@domain-name; other email protocols may use different address formats. In some email systems, a user’s email address is “aliased” or represented by his or her natural name rather than a fully qualified email address. For example, john.doe@abc.com might appear simply as John Doe.
Email Message Store
A top most email message store is the location in which an email system stores its data. For instance, an Outlook PST (personal storage folder) is a type of top most file that is created when a user’s Microsoft Outlook mail account is set up. Additional Outlook PST files for that user can be created for backing up and archiving Outlook folders, messages, forms, and files. Similar to a filing cabinet, which is not considered part of the paper documents contained in it, a top most store generally is not considered part of a family.
Email Metadata
Data stored in the email about the email. Often this data is not viewable in the email client application used to create the email. The amount of email metadata available for a particular email varies greatly depending on the email system. Contrast with File System Metadata and Document Metadata. See also Metadata.
Email String
See Email Thread.
Email Thread
A series of emails linked together by email responses or forwards. Comments, revisions, and attachments are all part of an email thread. Also referred to as an email string.
Encryption
A procedure that renders the contents of a message or file scrambled or unintelligible to anyone not authorized to read it. Encryption is used to protect information as it moves from one computer to another and is an increasingly common way of sending credit card numbers and other personal information over the Internet.
Encryption Key
A data value that is used to encrypt and decrypt data. The number of bits in the encryption key is a rough measure of the encryption strength; generally, the more bits in the encryption key, the more difficult it is to break.
End Document Number
The last single page image of a document. Also referred to as EndDoc#.
Endorser
A small printer in a scanner that adds a document-control number or other endorsement to each scanned sheet.
Enhanced Title
A meaningful or descriptive title for a document. The opposite of Verbatim Title.
Enterprise Architecture
Framework for how software, computing, storage, and networking systems should integrate and operate to meet the changing needs across an entire business.
Enterprise User Information (EUI)
Email, including attachments, and user files.
EOF
End of File. A distinctive code which uniquely marks the end of a data file.
EPP
Enhanced Parallel Port. A new, industry-standard parallel port having higher transfer times competitive with SCSI. Also known as Fast Mode Parallel Port.
EPS
Encapsulated PostScript. Uncompressed files for images, text, and objects. Only print on PostScript printers.
Erasable Optical Drive
A type of optical drive that uses erasable optical discs.
ESDI
Enhanced Small Device Interface. A defined, common electronic interface for transferring data between computers and peripherals, particularly disc drives.
ESI
Electronically stored information.
Ethernet
A common way of networking PCs to create a Local Area Network (LAN).
Evidentiary Image or Copy
See Forensic Copy.
Exabyte
A unit of 1000 petabytes. See Byte.
Export
Data extracted or taken out of one environment or application usually in a prescribed format and usually for import into another environment or application.
Extended Partitions
If a computer hard drive has been divided into more than four partitions, extended partitions are created. Under such circumstances each extended partition contains a partition table in the first sector that describes how it is further subdivided.
Extranet
An Internet-based access method to a corporate Intranet site by limited or total access through a security firewall. This type of access is typically utilized for joint ventures and vendor-client relationships.
Extrinsic Data
Information about the file, such as file signature, author, size, name, path, and creation/modification dates. This data is the accumulation of what is in the file, on the media label, discovered by the operator, and contributed by the user. Collectively, it represents the real value of examining an electronic file as opposed to the printed version.
F
False Positive/Negative
A result that is not correct. This may be a result of performing a process incorrectly or using a process that is not accurate.
Family Range
The range of documents from the first Bates Production Number assigned to the first page of the top most parent document through the last Bates Production Number assigned to the last page of the last child document.
Family Relationship
Formed among two or more documents that have a connection or relatedness because of some factor.
FAT
File Allocation Table. An internal data table on hard drives that keeps track of where the files are stored. If a FAT is corrupt, a drive may be unusable, yet the data may be retrievable with forensics. See also Cluster File.
FAX
Short for facsimile. A process of transmitting documents by scanning them to digital, converting to analog, transmitting over phone lines, reversing the process at the other end, and printing.
Fed. R. Civ P
Federal Rules of Civil Procedure. Also FRCP.
Fiber Optics
Transmitting information by sending light pulses over cables made from thin strands of glass.
Field
A name for an individual piece of standardized data, such as the author of a document, a recipient, the date of a document, or any other piece of data common to most documents in an image collection, to be extracted from the collection. Also referred to as Data Field.
Field Separator
A code that separates the fields in a record. For example, the CSV format uses a comma as the field separator.
File
Data stored under a specific name.
File Allocation Table
Microsoft operating systems store data in fixed-length blocks of bytes called clusters, with the size of these blocks depending on the type and size of the storage device. A File Location Table (FAT) is used to track the clusters that have been allocated to a specific file for Microsoft DOS, Windows, Windows 95, and Windows 98. The operating system relies upon the FAT to locate the data associated with a specific file and references in the FAT act as pointers to identify clusters by numeric reference.
File Compression
See Compression.
File Extension
A tag of three or four letters preceded by a period that identifies a data file's format or the application used to create the file. File extensions can streamline the process of locating data. For example, if one is looking for incriminating pictures stored on a computer, one might begin with the .gif and .jpg files.
File Format
The organization or characteristics of a file that allow it to be used with certain software programs.
File Level Binary Comparison
Method of de-duplication using the digital fingerprint (hash) of a file. File Level Binary Comparison ignores metadata, and can determine that “SHOPPING LIST.DOC” and “TOP SECRET.DOC” are actually the same document. See De-duplication, Data Verification, Digital Fingerprint, and Hash coding.
File Server
When several or many computers are networked together in a LAN situation, one computer may be utilized as a storage location for files for the group. File servers may be employed to store email, financial data, word processing information, or to backup the network. See also Server.
File Sharing
The ability for computer systems networked together to share files that are stored on the file server.
File Signature
The information within the file about the true, program-related origin of the file and, therefore, its type. Tools for reading file signatures identify the true program source, even if the file extension has been changed.
File Slack Space
See Slack Space.
File System
The engine that an operating system or program uses to organize and kept track of files. More specifically, the logical structures and software routines used to control access to the storage on a hard disc system and the overall structure in which the files are named, stored, and organized. The file system plays a critical role in computer forensics because the file system determines the logical structure of the hard drive, including its cluster size. The file system also determines what happens to data when the user deletes a file or subdirectory.
File System Metadata
Data that can be obtained or extracted about a file from the file system storing the file. Examples include file creation time, last modification time, and last access time. Contrast with Document Metadata and Email Metadata. See also Metadata.
File Transfer
The relocation of named files from one computer or network to another.
Filename
The name of a file, excluding root drive and directory path information. Different operating systems may impose different restrictions on filenames, for example, by prohibiting use of certain characters in a filename or imposing a limit on the length of a filename. The filename extension should indicate what type of file it is. However, users often change filename extensions to evade firewall restrictions or for other reasons. Therefore, file types must be identified at a binary level rather than relying on file extensions. See also File Extension and Full Path.
Filtering
Electronic filtering of emails and files for privilege or by keyword, file, type, or name. Filtering removes files that do not fit the search criteria and reduces the volume of data that requires further investigation. See also Culling.
FIPS
Federal Information Processing Standards issued by the National Institute of Standards and Technology after approval by the Secretary of Commerce pursuant to Section 111(d) of the Federal Property and Administrative Services Act of 1949, as amended by the Computer Security Act of 1987, Public Law 100-235.
Firewall
A set of related programs that protect the resources of a private network from users from other networks.
Flatbed Scanner
A flat-surface scanner that allows users to input books and other documents.
Floppy Disc
An increasingly rare storage medium consisting of a thin magnetic film disc housed in a protective sleeve. As opposed to Hard Disc. See also Disc.
Folder
See Directory.
Forensic Copy
A precise bit-by-bit copy of a computer system's hard drive, including slack and unallocated space.
Forensically Sound Procedures
Procedures used for acquiring electronic information in a manner that ensures it is "as originally discovered" and is reliable enough to be admitted into evidence.
Forensics
See Computer Forensics.
Form of Production
The manner in which requested documents are produced. Used to refer both to file format (native vs. PDF or TIFF) and the media on which the documents are produced (paper vs. electronic).
Format
The internal structure of a file that defines the way it is stored and used. Specific applications may define unique formats for their data (e.g., “MS Word document file format”). Many files may only be viewed or printed using their originating application or an application designed to work with compatible formats. Computer storage systems commonly identify files by a naming convention that denotes the format, and therefore the probable originating application. For example, “DOC” for Microsoft Word document files; “XLS” for Microsoft Excel spreadsheet files; “TXT” for text files; and “HTM” (for Hypertext Markup Language files such as Web pages. Users may choose alternate naming conventions, but this may affect how the files are treated by applications.
Format (verb)
To make a drive ready for first use. Erroneously thought to “wipe” drive. Typically, only overwrites FAT, but not files on the drive.
Forms Processing
A specialized imaging application designed for handling pre-printed forms. Forms processing systems often use high-end (or multiple) OCR engines and elaborate data validation routines to extract hand-written or poor quality print from forms that enter a database.
Fragmented Data
Live data that has been disseminated and stored in multiple areas on a single hard drive or disc. See De-fragment.
FTP
File Transfer Protocol. An Internet protocol that enables the transfer of files between computers over a network or the Internet.
Full Duplex
Data communications devices that allow full speed transmission in both directions at the same time.
Full Path
A path name description that includes the drive, starting or root directory, all attached subdirectories, and ending with the file or object name.
Full-text Indexing
Every word in the document is indexed into a master word list with pointers to the documents and pages where each occurrence of the word appears.
Full-text Search
The ability to search a data file for specific words, numbers, and/or combinations or patterns thereof by utilizing the Full-text Index.
Fuzzy Search
Subjective content searching (as compared to word searching of objective data). Fuzzy searching lets the user find documents where word matching does not have to be exact, even if the words searched are misspelled due to optical character recognition (OCR) errors.
G
GAL
Global Address List (Microsoft Outlook). Directory of all Microsoft Exchange users and distribution lists to whom messages can be addressed. The administrator creates and maintains this list. The GAL may also contain public folder names.
Ghost
See Bit Stream Backup.
GIF
Graphic Interchange Format. A computer compression format for pictures.
Gigabyte (GB)
A measure of computer data storage capacity. Equal to one billion (1,000,000,000) bytes.
GMT Timestamp
Identification of a file using Greenwich Mean Time as the central time authentication method.
GPS Generated Timestamp
Timestamp identifying time as a function of its relationship to Greenwich Mean Time.
Gray Scale
The use of many shades of gray to represent an image. Continuous-tone images, such as black-and-white photographs, use an almost unlimited number of shades of gray. Conventional computer hardware and software, however, can only represent a limited number of shades of gray (typically 16 or 256).
Grayscale
See Scale-to-Gray.
Groupware
Software designed to operate on a network and allow several people to work together on the same documents and files.
GUI
Graphical User Interface. A set of screen presentations and metaphors that utilize graphic elements such as icons in an attempt to make an operating system easier to use.
H
Hacker
Someone who breaks into computer systems in order to steal, change, or destroy information.
Half Duplex
Transmission systems which can send and receive, but not at the same time.
Halftone
See Dithering.
Hard Disc
A magnetic disc on which data can be stored. As opposed to Floppy Disc. See also Disc.
Hard Disc Drive
The primary storage unit on PCs, consisting of one or more magnetic media platters on which digital data can be written and erased magnetically.
Hard Drive
The primary storage unit on PCs, consisting of one or more magnetic media platters on which digital data can be written and erased magnetically.
Hard Drive Degaussing
A procedure that reduces the magnetic flux of a medium to virtual zero by applying a reverse magnetizing field. See also Wipe Drives.
Hard Drive Destruction
The process of physically damaging a drive so that is unable to be utilized within a computer and no known recovery methodology can retrieve data from it. The most effective destruction process is to shred the hard drive into miniscule pieces. See also Wipe Drives.
Hard Drive Overwriting
The process of replacing data (information) on a target hard drive with meaningless data in such a way that recovery of the meaningful information is impossible. DoD standards for overwriting drives include software application specifications, technician training, certification, and random sampling. See also Wipe Drives.
Harvesting
The process of retrieving or collecting electronic data from storage media or devices; an E-Discovery vendor “harvests” electronic data from computer hard drives, file servers, CDs, and backup tapes for processing and load to storage media or a database management system.
Hash
A mathematical algorithm that represents a unique value for a given set of data, similar to a digital fingerprint. Common hash algorithms include MD5 and SHA.
Hash Coding
The creation of a digital fingerprint that represents the binary content of a file unique to every electronically generated document; assists in subsequently ensuring that data has not been modified. See also Data Verification, Digital Fingerprint, and File Level Binary Comparison.
Hash Function
A function used to create a hash value from binary input. The hash is substantially smaller than the text itself, and is generated by the hash function in such a way that it is extremely unlikely that some other input will produce the same hash value.
Head
Each platter on a hard drive contains a head for each side of the platter. The heads are devices that ride very closely to the surface of the platter and allow information to be read from and written to the platter.
Hexadecimal
A number system with a base of 16. The digits are 0-9 and A-F, where F equals the decimal value of 15.
Hidden Files or Data
Files or data not visible in the file directory; cannot be accessed by unauthorized or unsophisticated users. Some operating system files are hidden to prevent inexperienced users from inadvertently deleting or changing these essential files. See also Steganography.
Hierarchical Storage Management (HSM)
Software that automatically migrates files from online to near-line storage media, usually on the basis of the age or frequency of use of the files.
Hold
See Legal Hold.
Holorith
Old-style punch cards that contained encoded data.
Horizontal De-duplication
A way to identify documents that are duplicated across multiple custodians or other production data sets. See De-duplication.
Host
In a network, the central computer which controls the remote computers and holds the central databases.
HP-PCL & HPGL
Hewlett-Packard graphics file formats.
HTML
HyperText Markup Language. The tag-based ASCII language used to create pages on the Web.
HTTP
HyperText Transfer Protocol. The underlying protocol used by the World Wide Web. HTTP defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. For example, when you enter a URL in your browser, this actually sends an HTTP command to the Web server directing it to fetch and transmit the requested Web page.
Hub
A network device that connects multiple computers or peripherals together and allows them to share data. A central unit that repeats and/or amplifies data signals being sent across a network.
Hyperlink
A link—usually appearing as a highlighted word or picture within a hypertext document—that when clicked changes the active view, possibly to another place within the same document or view, or to another document altogether, usually regardless of the application or environment in which the other document or view exists.
HyperText
Text that includes links or shortcuts to other documents or views, allowing the reader to easily jump from one view to a related view in a non-linear fashion.
I
Icon
In a GUI, a picture or drawing which is activated by “clicking” a mouse to command the computer program to perform a predefined series of events.
ICR
Intelligent Character Recognition. The conversion of scanned images (bar codes or patterns of bits) to computer recognizable codes (ASCII characters and files) by means of software/programs which define the rules of and algorithms for conversion.
IDE
Integrated Drive Electronics. An engineering standard for interfacing PCs and hard discs.
IEEE
Institute of Electrical and Electronic Engineers. An international association which sponsors meetings, publishes a number of journals, and establishes standards.
ILM
Information Lifecycle Management.
Image
To image a hard drive is to make an identical copy of the hard drive, including empty sectors. Also known as creating a “mirror image” or “mirroring” the drive. See Forensic Copy.
Image Enabling
A software function that creates links between existing applications and stored images.
Image File Format
See File Format and Format.
Image Key
The name of a file created when a page is scanned in a collection.
Image Processing
To capture an image or representation (usually from electronic data in native format), enter it in a computer system, and process and manipulate it. See also Native Format.
Import
To bring data into an environment or application that has been exported from another environment or application.
Inactive Record
Records related to closed, completed, or concluded activities. Inactive Records are no longer routinely referenced, but must be retained in order to fulfill reporting requirements or for purposes of audit or analysis. Inactive records generally reside in a long-term storage format remaining accessible for purposes of business processing only with restrictions on alteration. In some business circumstances, inactive records may be reactivated.
Index
The searchable catalog of documents created by search engine software. Index is often used as a synonym for search engine. Also known as "category."
Index/Coding Fields
Database fields used to categorize and organize documents. Often user-defined, these fields can be used for searches.
Indexing
Universal term for Coding and Data Entry.
Information
Facts, data, or instructions in any medium or form.
Input device
Any peripheral that allows a user to communicate with a computer by entering information or issuing commands (e.g., keyboard).
Instant Messaging (IM)
A form of electronic communication that involves immediate correspondence between two or more users who are all online simultaneously.
Interlaced
TV & CRT pictures must constantly be “refreshed." To interlace is to refresh every other line once. Since only half the information displayed is updated each cycle, interlaced displays are less expensive than “non-interlaced”. However, interlaced displays are subject to jitters. The human eye can usually detect displayed images which are completely refreshed at less than 30 times per second.
Interleave
To arrange data in a noncontiguous way to increase performance. When used to describe disc drives, it refers to the way sectors on a disc are organized. In one-to-one interleaving, the sectors are placed sequentially around each track. In two-to-one interleaving, sectors are staggered so that consecutively numbered sectors are separated by an intervening sector. The purpose of interleaving is to make the disc drive more efficient. The disc drive can access only one sector at a time, and the disc is constantly spinning beneath.
Internal Enquiries
A close examination of a matter in a search for information or truth that is internal to a company.
Internet
A worldwide network of networks that all use the TCP/IP communications protocol and share a common address space. It supports services such as email, the World Wide Web, file transfer, and Internet Relay Chat. Also known as “the net," “the information superhighway," and “cyberspace."
Internet Assigned Numbers Authority
The organization that manages and creates IP (internet protocol) addresses.
Internet Protocol
See IP Address.
Internet Publishing
Specialized imaging software that allows documents to be published on the Internet
Internet Relay Chat (IRC)
A form of real-time Internet chat or synchronous conferencing. It is mainly designed for group (many-to-many) communication in discussion forums called channels, but also allows one-to-one communication and data transfers via private message. IRC was created by Jarkko Oikarinen in late August 1988 to replace a program called MUT (MultiUser talk) on a BBS called OuluBox in Finland.
Inter-partition Space
Unused sectors on a track located between the start of the partition and the partition boot record. This space is important because it is possible for a user to hide information here.
Intranet
A private network that uses Internet-related technologies to provide services within an organization.
IP Address
A string of four numbers/groups of numbers separated by periods used to represent a computer on the Internet.
IPC
Image Processing Card. A board mounted in the computer, scanner, or printer that facilitates the acquisition and display of images. The primary function of most IPCs is the rapid compression and decompression of image files.
IPX/SPX
Communications protocol used by Novell networks.
IS/IT
Information Systems or Information Technology. Usually refers to the people who make computers and computer systems run.
ISA
Industry Standard Architecture.
ISDN
Integrated Services Digital Network. An all-digital network that can carry data, video, and voice.
ISIS and TWAIN Scanner Drivers
Specialized applications used for communication between scanners and computers.
ISO
International Standards Organization.
ISO 9660 CD Format
The International Standards Organization format for creating CD-ROMs that can be read worldwide.
ISP
Internet Service Provider. A business that delivers access to the Internet.
IT Infrastructure
The overall makeup of business-wide technology operations, including mainframe operations, stand-alone systems, email, networks (WAN and LAN), internet access, customer databases, enterprise systems, application support, regardless of whether managed, utilized, or provided locally, regionally, globally, etc., or whether performed or located internally or by outside providers (outsourced to vendors). The IT Infrastructure also includes applicable standard practices and procedures, such as backup procedures, versioning, resource sharing, retention practices, janitor program utilization, and the like.
ITU
International Telecommunications Union. An international organization under the UN headquartered in Geneva concerned with telecommunications that develops international data communications standards; known as CCITT prior to March 1, 1993. See http://www.itu.int.
J
Janitor Program
An application which runs at scheduled intervals to manage business information by deleting, transferring, or archiving online data (such as email) which is at or past its scheduled active life. Janitor programs are sometimes referred to as “agents”—software that runs autonomously “behind the scenes” on user systems and servers to carry out business processes according to pre-defined rules. Janitor programs must include a facility to support disposition and process holds.
Java
Sun Microsystems’ Java is a platform-independent, programming language for adding animation and other actions to websites.
Jaz Drive
A removable disc drive. A Jaz drive holds up to 2 GB of data. Commonly used for backup storage as well as everyday use.
JMS
Jukebox Management Software. See also Jukebox.
Journal
A chronological record of data processing operations that may be used to reconstruct a previous or an updated version of a file. In database management systems, it is the record of all stored data items that have values changed as a result of processing and manipulation of the data.
Journaling
A function of email systems (such as Microsoft Exchange and Lotus Notes) that copies sent and received items into a second information store for retention or preservation. Because Journaling takes place at the information store (server) level rather than at the mailbox (client) level, some message-related metadata, such as user foldering (what folder the item is stored in within the recipient’s mailbox) and the status of the “read” flag, is not retained in the journaled copy. The journaling function stores items in the system’s native format, unlike email archiving solutions, which use proprietary storage formats that are designed to reduce the amount of storage space required. Journaling systems also lack the sophisticated search and retrieval capabilities contained in email archiving solutions.
JPEG
Joint Photographic Experts Group. An image compression standard for photographs.
Jukebox
A mass storage device that holds optical discs and loads them into a drive.
Jukebox
Automated disc changer for high-performance, centralized storage for multifunction CD-ROMs and optical discs.
Jump Drive
See Key Drive.
K
Kerning
Adjusting the spacing between two letters.
Key Drive
A small, removable, data storage device that uses flash memory and connects via a USB port. Key drives are also known as keychain drive, thumb drive, jump drive, USB flash drive. Can be imaged and may contain residual data.
Key Field
Database fields used for document searches and retrieval.
Keystroke Monitoring
A form of user surveillance in which the actual character-by-character traffic (user's keystrokes) are monitored, analyzed, and/or logged for future reference.
Keyword Search
A search for documents containing one or more words that are specified by a user.
Keywords
Words designated by a user as important for searching purposes.
Kilobyte (K)
One kilobyte of data is equal to one thousand (1,000) bytes.
Kofax Board
The generic term for a series of image-processing boards manufactured by Kofax Imaging Processing. These are used between the scanner and the computer, and perform real-time image compression and decompression for faster image viewing, image enhancement, and corrections to the input to account for conditions such as document misalignment.
L
LAN
Local Area Network. A network of computers that generally spans a small area, such as a single building.
Landscape Mode
The image is represented on the page or monitor such that the width is greater than the height. The opposite of portrait mode.
Laser Disc
Same as an optical CD, except 12” in diameter.
Laser Printing
A beam of light hits an electrically charged drum and causes a discharge at that point. Toner is then applied which sticks to the non-charged areas. Paper is pressed against the drum to form the image and is then heated to dry the toner. Used in laser printers and copying machines.
Latency
The time it takes to read a disc (or jukebox), including the time to physically position the media under the read/write head, seek the correct address, and transfer it.
Latent Data
Deleted files and other data that are inaccessible without specialized forensic tools and techniques. Until overwritten, these data reside on media such as a hard drive in unused space and other areas available for data storage. Also known as "ambient data."
Leading/Ledding
The amount of space between lines of printed text.
Legacy Data
Information in the development process that may have significant resources invested into it that has been produced and/or stored on software and/or hardware that has become obsolete.
Legal Hold
A communication issued as a result of current or anticipated litigation, audit, government investigation, or other such matter that suspends the normal disposition or processing of records. The specific communication to business or IT organizations may also be called a “hold,” “preservation order,” “suspension order,” “freeze notice,” “hold order,” or “hold notice.”
Level Coding
Used in bibliographical coding to facilitate different treatment, such as prioritization or more thorough extraction of data, for different categories of documents, such as by type or source.
LFP
IPRO Tech’s image cross-reference file; an ASCII delimited text file required for cross-reference of images to data.
Lifecycle
The life span of a record from its creation or receipt to its final disposition. It is usually described in three stages: creation, maintenance and use, and archive to final disposition.
Link
See Hyperlink.
Load file
A file that relates to a set of scanned images and indicates where individual pages belong together as documents. A load file may also contain data relevant to the individual documents, such as metadata or coded data. Load files must be obtained and provided in prearranged formats to ensure transfer of accurate and usable images and data.
Local area network
See LAN.
Logfile
The Microsoft Windows NT Logfile, officially designated as $Logfile, is a special system file used by Microsoft Windows NT to keep track of what it is doing. If the system fails, NT uses the information stored in the Logfile to stabilize itself. The Logfile is similar to the Windows NT Page File since user information can pass through the Logfile unbeknownst to the user. Like the NT Page file, the Logfile should be analyzed for security leaks and investigative leads.
Logical File Space
The actual amount of space occupied by a file on a hard drive. The amount of logical file space differs from the physical file space because when a file is created on a computer, a sufficient number of clusters (physical file space) are assigned to contain the file. If the file (logical file space) is not large enough to completely fill the assigned clusters (physical file space) then some unused space will exist within the physical file space, known as slack space or unallotted space.
Logical Volume
An area on the hard drive that has been formatted for file storage. A hard drive may contain a single or multiple volumes.
Lossless Compression
Exact construction of image, bit-by-bit, with no loss of information. The opposite of lossy compression. See also Compression.
Lossy Compression
Reduces storage size of image by reducing the resolution and color fidelity while maintaining minimum acceptable standard for general use. The opposite of lossless compression. See also Compression.
LPI
Lines Per Inch. Number of horizontal and vertical lines (wires) in a halftone screen. The more lines, the finer the screen and the smaller the dot (for printers) or pixel (for monitors) sizes. As a general rule, newspapers print at 65-85 lpi.
LTO
Linear Tape-Open. A type of backup tape which can hold as much as 400 GB of data, or 600 CDs depending on the data file format.
LZW
Lempel-Ziv & Welch. A common, lossless compression standard for computer graphics, used for most TIFF files. Typical compression ratios are 4/1.
M
MAC Address
In computer networking a Media Access Control address (MAC address) is a unique identifier attached to most network adapters (NICs). It is a number that acts like a name for a particular network adapter. For example, the network cards (or built-in network adapters) in two different computers will have different names, or MAC addresses, as would an Ethernet adapter and a wireless adapter in the same computer, and as would multiple network cards in a router.
Magnetic/Optical Storage Media
Includes, but is not limited to, hard drives, backup tapes, CD-ROMs, DVD-ROMs, Jaz, and Zip drives.
Magneto-Optical Drive
A drive that combines laser and magnetic technology to create high-capacity erasable storage.
Mailbox
An area on a storage device where email is placed. In email systems, each user has a private mailbox. When the user receives email, the mail system automatically puts it in the appropriate mailbox.
Make-Available Production
A process whereby what is usually a large universe of all potentially responsive documents are made available to the requestor; from this universe, the requestor then reviews and selects or tags the documents which they wish to obtain, and the producing party produces only the selected documents to the requestor. This is sometimes done under an agreement protecting against privilege and confidentiality waiver during the initial make-available production; and the producing party, after the requestor has selected the documents they wish to obtain, reviews only the selected documents for privilege and confidentiality before the selected documents are physically produced to the requestor.
MAPI
Mail Application Program Interface. A Windows software standard that has become a popular email interface used by MS Exchange, GroupWise, and other email packages.
MAPI Mail Near-Line
Documents stored on optical discs or compact discs that are housed in the jukebox or CD changer and can be retrieved without human intervention.
Marginalia
Handwritten notes in the margin of the page in documents.
Master Boot Record
See Boot Sector.
Master File Table (MFT)
A unique system file that essentially acts as a database, containing information on all the files and subdirectories located within the NTFS logical volume (partition). There is at least one record for every file and subdirectory on the NTFS logical volume and each one is 1024 bytes in length and contains information, known as attributes, that tell the system how to deal with the file or directory associated with the record. If the full 1024 bytes are not used, the record can contain information from previous files, which is known as MFT slack. Knowledge of this MFT slack is vital to investigators because a computer forensics utility that captures file slack does not capture MFT slack.
In addition, the MFT sometimes stores the actual file data along with all the system data relating to the file, which is known as resident data. Resident data can have significant meaning concerning computer security issues regarding the potential leakage of sensitive data. If the MFT is corrupted, a drive may be unusable, yet data may be retrievable using forensic methods.
Mastering
Making many copies of a disc from a single master disc.
MCA
Micro Channel Architecture. An IBM bus standard.
MD5
Message-digest algorithm meant for digital signature applications where a large message has to be compressed in a secure manner before being signed with the private key.
MDE
Magnetic Disc Emulation. Software that makes a jukebox look and operate like a hard drive such that it will respond to all the I/O commands ordinarily sent to a hard drive.
Media
The physical material used to store electronic data. Media includes hard drives, backup tapes, computer discs, CD, DVD, PDA memory, etc.
Media Conversion
The relocation of data from one type of media to another such as tape to CD.
Megabyte (Meg)
A measurement of computer data. One megabyte of data is equal to one million (1,000,000) bytes.
Memory
Data storage in the form of chips, or the actual chips used to hold data; “storage” is used to describe memory that exists on tapes or discs. See RAM and ROM.
Menu
A list of options, each of which performs a desired action such as choosing a command or applying a particular format to a part of a document.
Merge
The process of combining various email files into one file for de-duplication purposes.
Message Header
Generally contains the identities of the author and recipients, the subject of the message, and the date the message was sent.
Metadata
Typically referred to by the less informative shorthand phrase “data about data,” it describes the content, quality, condition, history, and other characteristics of the data. Metadata is information about a particular data set which may describe, for example, how, when, and by whom it was received, created, accessed, and/or modified and how it is formatted. Some metadata, such as file dates and sizes, can easily be seen by users; other metadata can be hidden or embedded and unavailable to computer users who are not technically adept. Metadata is generally not reproduced in full form when a document is printed.
Metadata Comparison
A method of de-duplication that compares file metadata and ignores content. See De-duplication.
MFT
See Master File Table.
MICR
Magnetic Ink Character Recognition. The process used by banks to encode checks.
Microfiche
Sheet microfilm (4 by 6”) containing reduced images of 270 pages or more in a grid pattern.
Migrated Data
Information that has been moved from one database or format to another, usually as a result of a change from one hardware or software technology to another.
Migration
The relocation of files to another computer application or platform; may require conversion to a different format.
Mirror Image
Used in computer forensic investigations and some electronic discovery investigations, a mirror image is a bit-by-bit copy of a computer hard drive that ensures the operating system is not altered during the forensic examination. May also be referred to as “disc mirroring” or as a “forensic copy.”
Mirroring
The duplication of data for purposes of backup or to distribute Internet or network traffic among several servers with identical data. See also Disc Mirroring.
MIS
Management Information Systems. In general, a computer-based system that provides IT personnel with the necessary tools for categorizing, assessing, and efficiently running an organization's computer systems.
Modem
A device that allows a computer system to transmit data over telephone or cable lines.
Monochrome
Displays capable of only two colors, usually black and white or black and green.
Mosaic
A web browser popular before the introduction of Netscape and Internet Explorer.
Mount/Mounting
The process of making offline data available for online processing. For example, placing a magnetic tape in a drive and setting up the software to recognize or read that tape. The terms “load” and “loading” are often used in conjunction with, or synonymously with, “mount” and “mounting” (as in “mount and load a tape”). “Load” may also refer to the process of transferring data from mounted media to another media or to an online system.
MPEG-1 & -2
Two different standards for full motion video to digital compression/decompression techniques advanced by the Moving Pictures Experts Group. MPEG-1 compresses 30 frames/second of full-motion video down to about 1.5 Mbits/sec from several hundred megabytes. MPEG-2 compresses the same files down to about 3.0 Mbits/sec and provides better image quality.
MS-DOS
Microsoft (MS)-Disc Operating System. Used in PCs as the control system.
MTBF
Mean Time Between Failure. Average time between failures used to compute the reliability of devices and equipment.
MTTR
Mean Time to Repair. The higher the MTTR (average time to repair), the more costly and difficult to fix.
Multimedia
The combined use of different media; integrated video, audio, text, and data graphics in digital form.
Multisynch
Analog video monitors that can receive a wide range of display resolutions, usually including TV (NTSC). Color analog monitors accept separate red, green, and blue (RGB) signals.
N
Native Environment
The original configuration (software, passwords, server configuration, etc.) of a backup tape or email system.
Native File
A file in its original file format that has not been converted to a digital image or other file format
Native Format
Electronic documents have an associated file structure defined by the original creating application. This file structure is referred to as the “native format” of the document. Because viewing or searching documents in the native format may require the original application (e.g., viewing a Microsoft Word document may require the Microsoft Word application), documents are often converted to a standard file format (e.g., TIFF) as part of electronic document processing.
Natural Language Search
A manner of searching that permits the use of plain language without special connectors or precise terminology, such as “Where can I find information on William Shakespeare?” as opposed to formulating a search statement (such as “information” and “William Shakespeare”).
Near-line Data
A term used to refer to data or a robotic storage device (robotic library) that houses removable media, uses robotic arms to access the media, and uses multiple read/write devices to store and retrieve records. E.g., optical discs.
Near-line Data Storage
Storage in a system that is not a direct part of the network in daily use, but that can be accessed through the network. There is usually a small time lag between the request for data stored in near-line media and its being made available to an application or end-user. Making near-line data available will not require human intervention (as opposed to “off-line” data which can only be made available through human actions).
Nesting
Document nesting occurs when one document is inserted within another document (e.g., an attachment is nested within an email; graphics files are nested within a Microsoft Word document).
Net
See Internet.
Network
A group of computers or devices linked together to allow data and resources to be shared by authorized users.
Network Gear
Refers to the actual hardware used in the operation of networks; for example, routers, switches, and hubs.
Network Operating System
Software which directs the overall activity of networked computers.
NIST
National Institute of Standards and Technology. A federal technology agency that works with industry to develop and apply technology measurements and standards.
Node
Any device connected to network. PCs, servers, and printers are all nodes on the network.
Non-Interlace
When each line of a video image is scanned separately. Computer monitors use noninterlaced video.
NOS
Network Operating System. See Operating System.
NSF
Lotus Notes Format Database File (i.e. database.nsf). Can be either an email database or the traditional type of fielded database.
O
Objects
In programming terminology, an object is a freestanding block of code that defines the properties of something. Objects are created and used in a high-level method of programming called object-oriented programming (OOP). OOP involves giving programming objects characteristics that can be transferred to, added to, and combined with other objects to make a complete program.
OCR
Optical Character Recognition. A technology process that translates and converts printed matter on an image into a format that a computer can manipulate (e.g., ASCII codes) and, therefore, renders that text searchable. OCR software evaluates scanned data for shapes it recognizes as letters or numerals. All OCR systems include an optical scanner for reading text, and software for analyzing images. Most OCR systems use a combination of hardware (specialized circuit boards) and software to recognize characters, although some inexpensive systems operate entirely through software. Advanced OCR systems can read text in a large variety of fonts, but still have difficulty with handwritten text. OCR technology relies upon the quality of the imaged material, the conversion accuracy of the software, and the quality control process of the provider. The process is generally acknowledged to be only 80-85% accurate. See also OWR.
Official Record Owner
See Record Owner.
Offline
When computers and other devices are not connected to the network. The opposite of online.
Offline data
The storage of electronic data outside the network in daily use (e.g., on backup tapes) that is only accessible through the offline storage system, not the network.
Offline Storage
Electronic records stored or archived on removable disc (optical, compact, etc.) or magnetic tape used for making disaster-recovery copies of records for which retrieval is unlikely. Accessibility to offline media usually requires manual intervention and is much slower than online or near-line storage depending on the storage facility. The major difference between near-line data and offline data is that offline data lacks an intelligent disc subsystem and is not connected to a computer, network, or any other readily accessible system.
OLE
Object Linking and Embedding. A feature in Microsoft Windows which allows each section of a compound document to call up its own editing tools or special display features. This allows for combining diverse elements in compound documents.
Online
When computers and other devices are connected to the network. The opposite of offline.
Online Review
The culling process produces a dataset of potentially responsive documents that are then reviewed for a final selection of relevant or responsive documents and assertion of privilege exception as appropriate. Online review enables the culled dataset to be accessed via PC or other terminal device via a local network or remotely via the Internet. Often, the online review process is facilitated by specialized software, which provides additional features and functions that may include collaborative access of multiple reviewers, security, user logging, search and retrieval, document coding, redaction, and privilege logging.
Online storage
The storage of electronic data as fully accessible information in daily use on the network or elsewhere.
OOP
Object-oriented Programming. Involves giving programming objects characteristics that can be transferred to, added to, and combined with other objects to make a complete program. See also Objects.
Operating System (OS)
Software which directs the overall activity of a computer (e.g., MS-DOS®, Windows®, Linux®).
Optical Discs
Computer media similar to a compact disc that cannot be rewritten. An optical drive uses a laser to read the stored data.
Optical Jukebox
See Jukebox.
Originator
See Author.
OST
A Microsoft Outlook information store that is used to save folder information that can be accessed offline.
Overwrite
To record or copy new data over existing data, as in when a file or directory is updated. Data that is overwritten cannot be retrieved.
OWR
Optical Word Recognition (as opposed to Optical Character Recognition, OCR). OWR is a technology developed by iArchives that utilizes three OCR engines to analyze all characters in a string or word. This technology increases recognition across entire documents. It is especially beneficial when the first letter of a string may have been misspelled during a single OCR pass, which greatly reduces search engine hits. Search engines generally work left-to-right, and a first-letter misinterpretation results in a null find. The three-pass analysis OWR, combined with a retrieval system that uses "fuzzy" logic for an entire string of characters, will be more like to find a search term. See also OCR.
P
PAB
Personal Address Book. A Microsoft Outlook list of recipients created and maintained by an individual user for personal use. The PBA is a subset of the global address list (GAL).
PackBits
A compression scheme that originated with the Macintosh. Suitable only for black and white.
Packet
A unit of data sent across a network that may contain identifying and routing information. When a large block of data is to be sent over a network, it is broken up into several packets, sent, and then reassembled at the other end. The exact layout of an individual packet is determined by the protocol being used.
Page
A single image of the equivalent of one piece of paper. One or more pages make up a document.
Page File
A file used to temporarily store code and data for programs that are currently running. This information is left in the swap file after the programs are terminated and may be retrieved using forensic techniques. Also referred to as a swap file and paging file.
Paper Discovery
Paper discovery refers to the discovery of writings on paper that can be read without the aid of devices. The opposite of electronic discovery (or e-discovery).
Parallel
Transmission of all the bits (e.g., in a character) at the same time. If the character has eight bits, there are eight wires. Faster and more expensive than serial where the eight bits would be sent, “sideways”, one at a time.
Parent-child Relationships
A chain of documents that stems from a single email or storage folder. These types of relationships are primarily encountered when a party is faced with a discovery request for email.
Partition
An individual section of computer storage media such as a hard drive. For example, a single hard drive may be divided into several partitions. When a hard drive is divided into partitions, each partition is designated by a separate drive letter, e.g., C, D, etc.
Partition Gap
One physical hard disc drive can be partitioned to contain one or more logical drives when computer users utilize programs such as FDisk or Partition Magic. On large hard disc drives, it is not uncommon to have multiple partitions that can be used to store data in different logical drives, like drives C or D. When multiple partitions are involved it is possible for gaps to exist between the partitions, which are referred to as partition gaps since they can be used for covert data storage. Partition gaps can contain legacy data in sectors that were previously associated with data files stored on prior partitions, which can occur when physical hard disc drives are repartitioned during the upgrade of a computer. For these reasons, partition gaps can be a source of computer security risks and data hiding.
Partition Table
A table that indicates each logical volume contained on a disc and its location.
Partition Waste Space
After the boot sector of each volume or partition is written to a track, it is customary for the system to skip the rest of that track and begin the actual useable area of the volume on the next track. This results in unused or “wasted” space on that track where information can be hidden. This wasted space can only be viewed with a low-level disc viewer. However, forensic techniques can be used to search these wasted space areas for hidden information.
Password
A secret code utilized, usually along with a user ID, in order to log on or gain access to a PC, network, or other secure system, site, or application.
Path
The hierarchical description of where a directory, folder, or file is located on a computer or network. In DOS and Windows systems, a path is a list of directories where the operating system looks for executable files if it is unable to find the file in the working directory. The list of directories can be specified with the PATH command. Path is also used to refer to a transmission channel, the path between two nodes of a network that a data communication follows, and the physical cabling that connects the nodes on a network.
Pattern Matching
Any process that compares one file’s content with another file’s content.
Pattern Recognition
Technology that searches data for like patterns and flags, and extracts the pertinent data, usually utilizing an algorithm. For instance, in looking for addresses, alpha characters followed by a comma and a space, followed by two capital alpha characters followed by a space followed by five or more digits are usually the city, state, and zip code. By programming the application to look for a pattern, the information can be electronically identified, extracted, or otherwise utilized or manipulated.
PB
A petabyte is a measure of computer data storage capacity and is one thousand million million (1,000,000,000,000,000) bytes. See Byte.
PC
Personal computer based on microprocessor and designed to be used by one person at a time.
PCI
Peripheral Component Interface (Interconnect). A high-speed interconnect local bus used to support multimedia devices.
PCMCIA
Personal Computer Memory Card International Association. Plug-in cards for computers (usually portables), which extend the storage and/or functionality.
PDA
Personal Digital Assistant. A handheld device that provides any of the following features: computing, telephone, fax, Internet, or networking.
PDF
Portable Document Format. A file format developed by Adobe Systems, PDFs capture formatting information from a variety of desktop publishing applications, making it possible to send formatted documents and have them appear on the recipient's monitor or printer as they were intended. To view a file in PDF format, you need Adobe Acrobat Reader, a free application distributed by Adobe Systems.
Phase Change
A method of storing information on rewritable optical discs.
Physical Disc
An actual piece of computer media, such as the hard disc or drive, floppy discs, CD-ROM discs, Zip discs, etc.
Physical File Space
When a file is created on a computer, a sufficient number of clusters (physical file space) are assigned to contain the file. If the file (logical file space) is not large enough to completely fill the assigned clusters (physical file space) then some unused space will exist within the physical file space. This unused space is referred to as file slack and can contain unused space, previously deleted/overwritten files, or fragments thereof.
PICA
One sixth (1/6) of an inch. Used to measure graphics and fonts. There are 12 points per pica; 6 picas per inch; 72 points per inch.
Picture Element
The smallest addressable unit on a display screen. The higher the resolution (the more rows of columns), the more information can be displayed.
Ping
A computer network tool used to test whether a particular host is reachable across an IP network. Ping works by sending ICMP “echo request” packets to the target host and listening for ICMP “echo response” replies (sometimes dubbed "Pong" as an analog from the Ping Pong table tennis sport.) Using interval timing and response rate, ping estimates the roundtrip time (generally in milliseconds although the unit is often omitted) and packet loss (if any) rate between hosts.
Pitch
Characters (or dots) per inch, measured horizontally.
PKI Digital Signature
A document or file may be digitally signed using a party’s private signature key, creating a digital signature that is stored with the document. Anyone can validate the signature on the document using the public key from the digital certificate issued to the signer. Validating the digital signature confirms who signed it, and ensures that no alterations have been made to the document since it was signed. Similarly, an email message may be digitally signed using commonly available client software that implements an open standard for this purpose, such as Secure Multipurpose Internet Mail Extensions (S/MIME). Validating the signature on the email can help the recipient know with confidence who sent it, and that it was not altered during transmission. See also Certificate.
Plaintext
The least formatted and therefore most portable form of text for computerized documents.
Platter
One of several components that make up a computer hard drive. Platters are thin, rapidly rotating discs that have a set of read/write heads on both sides of each platter. Each platter is divided into a series of concentric rings called tracks. Each track is further divided into sections called sectors, and each sector is sub-divided into bytes.
PMS
Pantone Matching System. A color standard in printing.
POD (Print On Demand)
Print On Demand. Document images are stored in electronic format and are available to be printed quickly and in the exact quantity required, long or short runs.
Pointer
An index entry in the directory of a disc (or other storage medium) that identifies the space on the disc in which an electronic document or piece of electronic data resides, thereby preventing that space from being overwritten by other data.
Portable Volumes
A feature that facilitates the moving of large volumes of documents without requiring copying multiple files. Portable volumes enable individual CDs to be easily regrouped, detached, and reattached to different databases for a broader information exchange.
Portrait Mode
The image is represented on the page or monitor such that the height exceeds the width. The opposite of landscape mode.
Preservation
The process of ensuring retention and protection from destruction or deletion all potentially relevant evidence, including electronic metadata. See also Spoliation.
Preservation Notice/Order
See Legal Hold.
Printout
A printed version of text of data. Synonymous with hard copy.
Private Network
A network that is connected to the Internet but is isolated from the Internet with security measures allowing use of the network only by persons within the private network. The opposite of public network.
Privilege Data Set
The universe of documents identified as responsive and/or relevant, but withheld from production on the grounds of attorney-client privilege or work product.
Processing Data
See Image Processing.
Production
The process of delivering to another party, or making available for that party’s review, documents deemed responsive to a discovery request.
Production Data Set
The universe of documents identified as responsive to document requests and not withheld on the grounds of attorney-client privilege or work product.
Production De-duplication
Culling of a document if multiple copies of that document reside within the same production set. For example, if two identical documents are both marked responsive, non-privileged, production de-duplication ensures that only one of those documents are produced. Contrast with case de-duplication and custodian de-duplication.
Production Number
See Bates Production Number.
PST
Personal Folder File. The place where Outlook stores its data (when Outlook is used without Microsoft® Exchange Server). A PST file is created when a mail account is set up. Additional PST files can be created for backing up and archiving Outlook folders, messages, forms, and files. The file extension given to PST files is .pst.
Public Network
A network that is part of the public Internet. The opposite of private network.
R
RAM
Random Access Memory. The hardware inside a computer that retains memory on a short-term basis and stores information while the user utilizes the computer.
Raster/Rasterized
Raster or Bitmap Drawing. A method of representing an image with a grid (or “map”) of dots. Typical raster file formats are GIF, JPEG, TIFF, PCX, BMP, etc.
Record
Information created, received, and maintained as evidence and information by an organization or person, in pursuance of legal obligations or in the transaction of business (ISO 15489(1). Collectively the term is used to describe both documents and electronically stored information.
Record Custodian
See Custodian.
Record Lifecycle
See Lifecycle.
Record Series
A description of a particular set of records within a file plan. Each category has retention and disposition data associated with it, applied to all record folders and records within the category. (DOD 5015)
Record Submitter
The person who enters a record in an application or system. This may be, but is not necessarily, the author or the custodian.
Records Hold
See Legal Hold.
Records Management
Records Management is the planning, controlling, directing, organizing, training, promoting, and other managerial activities involving the lifecycle of information, including creation, Records Retention Period, Retention Period.
Records Manager
The records manager is responsible for the implementation of a records management program in keeping with the policies and procedures that govern that program, including the identification, classification, handling, and disposition of the organization’s records throughout their retention life. The physical storage and protection of records may be a component of this individual’s functions, but it may also be delegated to someone else.
Records Retention Period, Retention Period
The length of time a given records series must be kept, expressed as either a time period (e.g., four years), an event or action (e.g., audit), or a combination (e.g., six months after audit).
Records Retention Schedule
A plan for the management of records, listing types of records and how long they should be kept; the purpose is to provide continuing authority to dispose of or transfer records to historical archives.
Records Store
See Repository for Electronic Records.
Recover, Recovery
See Restore.
Redaction
A portion of an image or document is intentionally concealed to prevent disclosure of specific portions. Often done to avoid production of privileged or irrelevant materials.
Refresh Rate
The number of times per second a display (such as on a CRT or TV) is updated.
Region (of an image)
An area of an image file that is selected for specialized processing. Also referred to as zone.
Registration
Lining up a forms image to determine which fields are where. Also, entering pages into a scanner such that they are correctly read.
Relative Path
An implied path.
Remote Access
The ability to access and use digital information from a location off-site from where the information is physically located. For example, to use a computer, modem, and some remote access software to connect to a network from a distant location.
Render Images
To take a native format electronic file and convert it to an image that appears as the original format file as if printed to paper.
Report
Formatted output of a system providing specific information.
Repository
A centralized database stored on a computer that houses specific information.
Repository for Electronic Records
A direct-access device on which the electronic records and associated metadata are stored. Sometimes called a “records store,” “online repository,” or “records archive.”
Residual Data
See Latent Data. Also known as ambient data.
Resolution
See DPI.
Restore
To transfer data from a backup medium (such as tapes) to an online system, often for the purpose of recovery from a problem, failure, or disaster. Restoration of archival media is the transfer of data from an archival store to an online system for the purposes of processing (such as query, analysis, extraction, or disposition of that data). Archival restoration of systems may require not only data restoration but also replication of the original hardware and software operating environment. Also referred to as recovery.
Retention Schedule
See Records Retention Schedule.
Reverse Engineering
The process of analyzing a system to identify its intricacies and their interrelationships, and creating depictions of the system in another form or at a higher level. Reverse engineering is usually undertaken in order to redesign the system for better maintainability or to produce a copy of a system without utilizing the design from which it was originally produced.
Review
The culling process produces a dataset of potentially responsive documents which are then examined and evaluated for a final selection of relevant or responsive documents and assertion of privilege exception as appropriate. Also see Online Review.
Rewriteable Technology
Storage devices where the data may be written more than once—typically hard drives, floppies and optical discs.
RFC822
The standard that specifies a syntax for text messages that are sent among computer users, within the framework of email.
RGB
Red, Green, and Blue. The three primary colors in the additive color family which create all the computer color video signals for a computer’s color terminal.
RIM
Records and Information Management.
RIP
The procedures used to unbundle email collections into individual emails during the e-discovery process while preserving authenticity and ownership.
RLE
Run Length Encoded. Supporting only 256 colors, this compressed image format is most effective on images with large areas of black or white.
ROM
Read Only Memory. The hardware in a computer that that can be read but not written to. ROM contains the programming that allows a computer to boot up each time the user turns it on, and it contains essential system programs that neither the user nor the computer can erase.
Root Directory
The top level in a hierarchical file system. For example on a PC, it is the root directory of your hard drive, usually C:, contains all the second-level subdirectories on that drive.
Rotary Camera
In microfilming, the papers are read on the fly with a camera that’s synchronized to the motion.
Router
A device that forwards data packets along networks. A router is connected to at least two networks, commonly two LANs or WANs or a LAN and its ISP’s network. Routers are located at gateways, the places where two or more networks connect.
Rule 16
From Federal Rules of Civil Procedure (Fed. R. Civ P). Pretrial conference. Rule 16 may provide a party with an opportunity to discuss settlement without giving the appearance of having initiated the conversation.
Rule 26
From Federal Rules of Civil Procedure (Fed. R. Civ P). General provisions governing discovery; duty of disclosure.
S
Sampling
Usually refers to the process of statistically testing a data set for the likelihood of relevant information. It can be a useful technique in addressing a number of issues relating to litigation, including decisions as to which repositories of data should be preserved and reviewed in a particular litigation, and determinations of the validity and effectiveness of searches or other data extraction procedures. Sampling can be useful in providing information to the court about the relative cost burden versus benefit of requiring a party to review certain electronic records.
Sampling Rate
The frequency at which analog signals are converted to digital values during digitization. The higher the rate, the more accurate the process.
SAN
Storage Area Network. A high-speed subnetwork of shared storage devices. A storage device is a machine that contains nothing but a disc(s) for storing data. A SAN’s architecture works in a way that makes all storage devices available to all servers on a LAN or WAN. As more storage devices are added to a SAN, they too will be accessible from any server in the larger network. In this case, the server merely acts as a pathway between the end user and the stored data. Because stored data does not reside directly on any of a network’s servers, server power is utilized for business applications, and network capacity is released to the end user. Also see Network.
Sandbox
A network or series of networks that are not connected to other networks.
Scalability
The capacity of a system to expand without requiring major reconfiguration or re-entry of data. For example, multiple servers or additional storage can be easily added.
Scale-to-Gray
An option to display a black and white image file in an enhanced mode, making it easier to view. A scale-to-gray display uses gray shading to fill in gaps or jumps (known as aliasing) that occur when displaying an image file on a computer screen. Also known as grayscale.
Scanner
An input device commonly used to convert paper documents into images. Scanner devices are also available to scan microfilm and microfiche.
Scanning
The process of converting a hard copy paper document into a digital image for use in a computer system. After a document has been scanned, it can be reviewed using field and full-text searching, instant document retrieval, and a complete range of electronic document review options.
Scanning Software
Software that enables a scanner to deliver industry standard formats for images in a collection. Enables the use of OCR and coding of the images.
Schema
A set of rules or conceptual model for data structure and content, such as a description of the data content and relationships in a database.
Scroll Bar
The bar on the side or bottom of a window that allows the user to scroll up and down through the window’s contents. Scroll bars have scroll arrows at both ends and a scroll box, all of which can be used to scroll around the window.
SCSI
Pronounced “skuzzy.” Small Computer System Interface. A common, industry standard, electronic interface (highway) between computers and peripherals, such as hard discs, CD-ROM drives, and scanners. SCSI allows for up to 7 devices to be attached in a chain via cables.
SDLT
Super DLT. A type of backup tape which can hold up to 220 GB or 330 CDs, depending on the data file format. See DLT.
Search
See Compliance Search, Concept Search, Contextual Search, Boolean Search, Full-Text Search, Fuzzy Search, Index, Keyword Search, Pattern Recognition, Proximity Search, QBIC, Sampling, and Search Engine.
Search Engine
A program that enables search for keywords or phrases, such as on web pages throughout the World Wide Web.
Sector
Bits and bytes make it possible for computers to perform computations and store data. For efficiency purposes, bytes are stored on discs in blocks of data called sectors, with most computer systems relying upon a sector size of 512 bytes of data (4096 bits). The sector is the smallest unit of storage on a computer storage device and is generally a power of 2 bytes in size.
Sectors are created and mapped when the computer storage device is low-level formatted and is written consecutively to discs on tracks. As the sector is created and written to the disc, the storage media is verified for accuracy with this verification process involving the writing of 512 bytes (4096 bits) of data to disc.
Sector Gap
Sectors consist of fixed blocks of storage space that usually contain 512 bytes of data. An equal number of sectors are written to each track on a floppy diskette, hard disc drive, and most storage devices; however, the circumference of the outside tracks is much larger than the circumference of the inside tracks. For this reason, much of the storage space is wasted on some storage devices; however, modern hard disc drives have eliminated much of this waste through the use of advanced data storage mapping techniques. In addition, on some storage devices the area between sectors on the larger tracks can be used for covert data storage and this area is referred to as sector gap.
Sequence Checking
A verification of the alphanumeric sequence of the key field in items to be processed.
Serif
The little cross bars or curls at the end of strokes on certain type fonts.
Server
Any computer on a network that contains data or applications shared by users of the network on their client PCs.
Service-level Agreement
A service-level agreement is a contract that defines the technical support or business parameters that a service provider or outsourcing firm will provide its clients. The agreement typically spells out measures for performance and consequences for failure.
SGML
Standard Generalized Markup Language. An informal industry standard for open systems document management which specifies the data encoding of a document’s format and content.
SGML/HyTime
A multimedia extension to SGML, sponsored by DOD.
SHA-1
Secure Hash Algorithm. Used for computing a condensed representation of a message or a data file specified by FIPS PUB 180-1.
Sibling
A document that shares a common parent with the document in question (e.g., two attachments that share the same parent email or are sibling documents in the same Zip file). See Parent-child Relationships for more information.
Signature
See Certificate.
SIMM (Single, In-Line Memory Module)
A mechanical package (with “legs”) used to attach memory chips to printed circuit boards.
Simplex
One-sided page(s).
Skewed
Tilted, used to describe an image. See De-skewing.
Slack Space
Slack potentially contains randomly selected bytes of data from computer memory because DOS/Windows normally writes in 512 byte blocks called sectors. Clusters are made up of blocks of sectors. However, if there is not enough data in the file to fill the last sector in a file, DOS/Windows makes up the difference by padding the remaining space with data from the memory buffers of the operating system. This randomly selected data from memory is called 'RAM Slack' because it comes from the memory of the computer. RAM Slack can contain any information that may have been created, viewed, modified, downloaded, or copied during work sessions that have occurred since the computer was last booted. Thus, if the computer has not been shut down for several days, the data stored in file slack can come from work sessions that occurred in the past.
SLIP
Serial Line Internet Protocol. A connection to the Internet in which the interface software runs in the local computer, rather than the Internet’s.
Smart Card
A credit-card-size device which contains a microprocessor, memory, and a battery.
SMTP
Simple Mail Transfer Protocol. The protocol widely implemented on the Internet for exchanging email messages.
Software
Coded instructions (programs) that make a computer do useful work.
Software Application
A program that instructs a computer to perform a specific set of instructions or execute a process. Some software applications are user-driven like Microsoft Word or Notepad, while others are system-driven like the Windows system clock or automatic virus scanning programs.
Speckle
Imperfections in an image that do not appear on the original as a result of scanning paper documents. See De-speckling.
Splatter
Data that should be kept on one disc of a jukebox goes instead to multiple platters.
Spoliation
Generally, the intentional or negligent destruction or alteration of evidence when there is current litigation or an investigation or there is reasonable anticipation that either may occur in the near future. Some jurisdictions also define it as a failure to preserve information that may become evidence.
Spoofing
In computer networking, the term Internet Protocol address spoofing is the creation of IP packets with a forged (spoofed) source IP address. Since 'IP address' is sometimes just referred to as an IP, IP spoofing is another name for this term.
SPP
Standard Parallel Port. See Centronics Interface.
SQL
Structured Query Language. A standard, fourth-generation programming language (4GL: a programming language that is closer to natural language and easier to work with than a high-level language). The popular standard for running database searches (queries) and reports.
Stand-alone Computer
A single computer not connected to a network.
Status Bar
A bar at the bottom of a window that is used to indicate the status of a task. For example, when an email message is sent, the status bar will fill with dots indicating that a message is being sent.
Steganalysis
The process of detecting steganography by looking at variances between bit patterns and unusually large file sizes.
Steganography
The hiding of information within a more obvious kind of communication. Although not widely used, digital steganography involves the hiding of data inside a sound or image file. Steganalysis is the process of detecting steganography by looking at variances between bit patterns and unusually large file sizes.
Storage Device
Any device that a computer uses to store information.
Storage Media
Any removable device that stores data. See magnetic or optical storage media.
Subjective Coding
The coding of a document using legal interpretation as the data that fills a field, versus objective data that is readily apparent from the face of the document (e.g., date, type, author, addresses, recipients, and names mentioned). Usually performed by paralegals or other trained legal personnel.
Subnet Address
In computer networks, a subnetwork or subnet is a range of logical addresses within the address space that is assigned to an organization. Subnetting is a hierarchical partitioning of the network address space of an organization (and of the network nodes of an autonomous system) into several subnets. Routers constitute borders between subnets. Communication to and from a subnet is mediated by one specific port of one specific router, at least momentarily.
Subtractive Colors
Since the colors of objects consist of white light minus the color absorbed by the object, they are called subtractive. This is how ink on paper works. The subtractive colors of process ink are CMYK (Cyan, Magenta, Yellow and Black) and are specifically balanced to match additive colors (RGB).
Suspension Notice/Order
See Legal Hold.
SVGA
Super Video Graphics Adapter. A graphics adapter that exceeds the minimum VGA standard of 640 by 480 by 16 colors. Can reach 1600 by 1280 by 256 colors.
Swap File
A file used to temporarily store code and data for programs that are currently running. This information is left in the swap file after the programs are terminated and may be retrieved using forensic techniques. Also referred to as a page file or paging file.
SysAdmin
System administrator. The person in charge of keeping a network working.
SysOp
See SysAdmin.
System Registry
The system configuration files used by Microsoft Windows to store settings about user preferences, installed software, hardware, drivers, and other settings required for Windows to run correctly.
T
T1
A high-speed, high-bandwidth, leased-line connection to the Internet. T1 connections deliver information at 1.544 megabits per second.
T3
A high-speed, high-bandwidth, leased-line connection to the Internet. T3 connections deliver information at 44.746 megabits per second.
Tape Drive
A hardware device used to store data on a magnetic tape. Tape drives are usually used to back up large quantities of data due to their large capacity and cheap cost relative to other data storage options.
Taxonomy
The science of categorization, or classification, of things based on a predetermined system. In reference to websites and portals, a site’s taxonomy is the way it organizes its data into categories and subcategories, sometimes displayed in a site map.
TCP/IP
Transmission Control Protocol/Internet Protocol A collection of protocols that define the basic workings of the features of the Internet.
Telephony
Converting sounds into electronic signals for transmission.
Templates
Sets of index fields for documents, providing framework for preparation.
Temporary File
Files stored on a computer for temporary use only and often created by Internet browsers. These temp files store information about websites that a user has visited and allow for more rapid display of the Web page when the user revisits the site. Forensic techniques can be used to track the history of a computer’s Internet usage through the examination of these temporary files. Temp files are also created by common office applications, such as word process or spreadsheet applications.
Terabyte
A unit of 1,000 or 1,024 gigabytes, or approximately a trillion bytes.
TGA
Targa format. This is a scanned format—widely used for color-scanned materials (24-bit) as well as by various paint and desktop publishing packages.
Thin Client
A networked user computer that acts only as a terminal and stores no applications or user files. May have little or no hard drive space. See Client.
Thread
A series of postings on a particular topic. Threads can be a series of bulletin board messages (for example, when someone posts a question and others reply with answers or additional queries on the same topic). A thread can also apply to chats, where multiple conversation threads may exist simultaneously. See also Email Thread.
Thumb Drive
See Key Drive.
Thumbnail
A miniature representation of a page or item for quick overviews to provide a general idea of the structure, content, and appearance of a document. A thumbnail program may be standalone or part of a desktop publishing or graphics program. Thumbnails take considerable time to generate, but provide a convenient way to browse through multiple images before retrieving the one needed. Programs often allow clicking on the thumbnail to retrieve it.
TIFF
Tagged Image File Format. One of the most widely supported file formats for storing bit-mapped images. Files in TIFF format often end with a .tif extension. Also spelled TIF.
TIFF Group III (compression)
A one-dimensional compression format for storing black and white images that is utilized by many fax machines. See TIFF.
TIFF Group IV (compression)
A two-dimensional compression format for storing black and white images. Typically compresses at a 20-to-1 ratio for standard business documents. See TIFF.
Toggle
A switch that is either on or off, and reverses to the opposite when selected.
Toolbar
The row of buttons right below the menu that performs special functions quickly and easily.
Topology
The geometric arrangement of a computer system. Common topologies include:
• bus: network topology in which nodes are connected to a single cable with terminators at each end),
• star: local area network designed in the shape of a star, where all end points are connected to one central switching device or hub
• ring: network topology in which nodes are connected in a closed loop; no terminators are required because there are no unconnected ends.
Star networks are easier to manage than ring topology.
Track
Each of the series of concentric rings contained on a hard drive platter.
Transfer
The process of moving or transmitting a file from one location to another, as between two programs or from one computer to another.
True Resolution
The true optical resolution of a scanner is the number of pixels per inch (without any software enhancements).
TWAIN
Tookit Without An Interesting Name. A universal toolkit with standard hardware or software drivers for multimedia peripheral devices.
Typeface
There are over 10,000 typefaces available for computers. The general categories are:
• Oldstyle. Faces have slanted serifs, gradual thick to thin strokes, and a slanted stress. The “O” appears slanted.
• Modern. Faces have thin, horizontal serifs, radical thick to thin strokes, and a vertical street. The “O” does not appear to slant.
• Slab serif. Faces have thick, horizontal serifs, little or no thick-to-thin in the strokes, and a vertical stress. The “O” appears vertical.
• Sans serif. Faces have no serifs
• Script. From elaborate handwriting styles to casual, freeform, unconnected letter forms.
• Decorative unusual fonts. Designed to be very different and attention getting.
U
Ultrafiche
Microfiche that can hold 1,000 documents/sheet as opposed to the normal 270.
UMS
Universal Messaging System.
Unallocated Space
The area of computer media, such as a hard drive, that does not contain normally accessible data. Unallocated space is usually the result of a file being deleted. When a file is deleted, it is not actually erased, but is simply no longer accessible through normal means. The space that it occupied becomes unallocated space, i.e., space on the drive that can be reused to store new information. Until portions of the unallocated space are used for new data storage, in most instances, the old data remains and can be retrieved using forensic techniques.
Unitization
The assembly of individually scanned pages into documents. Physical unitization utilizes actual objects such as staples, paper clips, and folders to determine pages that belong together as documents for archival and retrieval purposes. Logical unitization is the process of human review of each individual page in an image collection using logical cues to determine pages that belong together as documents. Such cues can be consecutive page numbering, report titles, similar headers and footers, and other logical indicators. This process should also capture document relationships, such as parent and child attachments. See also Attachment.
UNIX
A software operating system.
Upgrade
A new or better version of a hardware or software.
Upload
To send a file from one computer to another via modem, network, or serial cable. With a modem-based communications link, the process generally involves the requesting computer instructing the remote computer to prepare to receive the file on its disc and wait for the transmission to begin.
URI
Uniform Resource Identifier. A compact string of characters used to identify or name a resource. The main purpose of this identification is to enable interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. URIs are defined in schemes defining a specific syntax and associated protocols.
URL
Uniform Resource Locator. The addressing system used in the World Wide Web and other Internet resources. The URL contains information about the method of access, the server to be accessed, and the path of any file to be accessed. A URL appears like this: http://thesedonaconference.org.
User-added Metadata
Data or work product created by a user while reviewing a document, including annotations and subjective coding information.
V
V.32bis
The ITU standard for 14.4 kbs modem communications. See ITU.
V.34
The ITU standard for 28.8 kbs modem communications. See ITU.
Validate
To confirm or ensure well-grounded logic and true and accurate determinations.
VAR/VAD/VASD
Value-Added Reseller/Value-Added Dealer/Value-Added Specialty Distributor. Companies or people who sell computer hardware or software and add value in the process. Usually, the value added is specific technical or marketing knowledge and/or experience.
VDT
Video Display Terminal. Generic name for all display terminals.
Vector
Representation of graphic images by mathematical formulas. For instance, a circle is defined by a specific position and radius.
Vendor-added Metadata
Data created and maintained by the electronic discovery vendor as a result of processing the document. While some vendor-added metadata has direct value to customers, much of it is used for process reporting, chain of custody, and data accountability. The opposite of customer-added metadata.
Verbatim Coding
Extracting data from documents in a collection in a way that matches exactly as the information appears in the documents.
Version, Record Version
A particular form or variation of an earlier or original record. For electronic records the variations may include changes to file format, metadata, or content.
Vertical De-duplication
A process through which duplicate data is eliminated within a single custodial or production data set. See Content Comparison, File Level Binary Comparison, Horizontal De-duplication, Metadata Comparison.
VESA
Video Electronics Standards Association. Concentrates on computer video standards.
VGA
Video Graphics Adapter. A PC industry standard, first introduced by IBM in 1987, for color video displays. The minimum dot (pixel) display is 640 x 480 x 16 colors. Super VGA was introduced at 800 x 600 x 16, then 256 colors. VGA can extend to 1024 x 768 x 256 colors.
Video Scanner Interface
A type of device used to connect scanners with computers. Scanners with this interface require a scanner control board designed by Kofax, Xionics, or Dunord.
Virus
A self-replicating program that spreads by inserting copies of itself into other executable code or documents. A program into which a virus has inserted itself is said to be infected, and the infected file (or executable code that is not part of a file) is a host. Viruses are a kind of malware (malicious software). Viruses can be intentionally destructive, for example by destroying data, but many viruses are merely annoying. Some viruses have a delayed payload, sometimes referred to as a bomb. The primary downside of viruses is uncontrolled self-reproduction, which desecrates or engulfs computer resources.
Vital Record
A record that is essential to the organization’s operation or to the reestablishment of the organization after a disaster.
Vlog
Short for videolog, a vlog is a Weblog that uses video as its primary medium for distributing content. Vlog posts are usually accompanied by text, image, and other metadata to provide a context or overview for the video.
VoIP
Voice over Internet Protocol Telephonic capability across an IP connection; increasingly used in place of standard telephone systems.
Volume
A specific amount of storage space on computer storage media such as hard drives, floppy discs, CD-ROM discs. In some instances, computer media may contain more than one volume, while in others, one volume may be contained on more than one disc.
Volume Boot Sector
When a partition is formatted to create a volume, a volume boot sector is created to store information about the volume. One volume contains the operating system and its volume boot sector contains code used to load the operating system when the computer is booted up.
VPN
Virtual Private Network. A secure network that is constructed by using public wires to connect nodes. For example, there are a number of systems that enable creation of networks using the Internet as the medium for transporting data. These systems use encryption and other security mechanisms to ensure that only authorized users can access the network and that the data cannot be intercepted.
W
WAV
File extension name for Windows sound files. .wav files can reach 5 Megabytes for one minute of audio.
Website
A collection of Uniform Resource Indicators (URIs), including Uniform Resource Locators (URLs), in the control of one administrative entity. May include different types of URIs (e.g., file transfer protocol sites, telnet sites, as well as World Wide Web sites). See URI and URL.
Windows Swap (Page) File
Microsoft Windows-based computer operating systems utilize a special file as a scratch pad to write data when additional random access memory is needed, called Windows Swap Files or Windows Page Files. Windows Swap and Page Files are potentially very large and most computer users are unaware of their existence; the potential exists for these huge files to contain remnants of word processing, email messages, Internet browsing activity, database entries, and almost any other work that may have occurred during past Windows work sessions. This situation creates a significant security problem since the potential exists for data to be transparently stored within the Windows Swap File without the knowledge of the computer user, which can occur even if the work product was stored on a computer network server. The result is a significant computer security weakness that can be of benefit to the computer forensics specialist.
Winnowing
See Chaff.
Wipe Drives
The process of sanitizing and clearing all data off a hard drive. This is done to ensure there is no data remaining on the target hard drive. The methods of wiping drives include overwriting, degaussing and drive destruction.
Workflow, Ad Hoc
A simple manual process by which documents can be moved around a multi-user review system on an as-needed basis.
Workflow, Rule-Based
A programmed series of automated steps that route documents to various users on a multi-user review system.
Workgroup
A group of computer users connected to share individual talents and resources as well as computer hardware and software—often to accomplish a team goal.
World Wide Web
The WWW is made up of all of the computers on the Internet which use HTML-capable software (Netscape, Explorer, etc.) to exchange data. Data exchange on the WWW is characterized by easy-to-use graphical interfaces, hypertext links, images, and sound. Today the WWW has become synonymous with the Internet, although technically it is really just one component.
WORM
Write-Once, Read-Many. Data storage devices where the space on the discs can only be written once. The data is permanently stored. This is often today’s primary media for archival information. Common disc sizes run from 5.25” (1.3 gigabytes) to 12” (8 to 10 gigabytes) capacities. There is also a 14” disc (13 to 15 gigabytes), only manufactured by Kodak’s optical storage group. WORMs can also be configured into jukeboxes. There are various technologies. The expected viable lifetime of a WORM is at least 50 years. Since it’s impossible to change, the government treats it just like paper or microfilm and it is accepted in litigation and other record-keeping applications. On the negative side, there is no current standard for how WORMs are written. The only ISO standard is for the 14” version, manufactured only by one vendor. A 5.25” standard is emerging from the European Computer Manufacturing Association but is not yet accepted. Further, WORM discs are written on both sides, but there are currently no drives that read both sides at the same time. As for speed, WORM is faster than tape or CD-ROM, but slower than magnetic. Typical disc access times run between 40 and 150 milliseconds (compared with 11 ms for fast magnetic discs and 300 ms for CD-ROM). Data transfer rates run between 1 and 2 MB/sec (compared with 5 to 10 for magnetic discs and 600KB/sec for CD-ROM).
WYSIWYG
”What You See Is What You Get.” Display and software technology which shows on the computer screen exactly what will print. Often requires a large, high-density monitor.
X
X.25
A standard protocol for data communications.
XML
Extensible Markup Language. A specification developed by the W3C (World Wide Web Consortium—the Web development standards board). XML is a pared-down version of SGML, designed especially for Web documents. It allows designers to create their own customized tag, enabling the definition, transmission, validation, and interpretation of data between applications and between organizations.
Z
ZIP
An open standard for compression and decompression used widely for PC download archives. ZIP is used on Windows-based programs such as WinZip and Drag and Zip. The file extension given to ZIP files is .zip.
Zip® Drives
A magnetic storage device that can hold between 100 and 250 megabytes of data.
Zone OCR
An add-on feature of the imaging software that populates document templates by reading certain regions or zones of a document, and then placing the text into a document index.