Archiving — A Fallacy Print

   'Tis a Sad Day in Mudville.  And in case you do not understand the reference I would recommend reading Casey at the Bat by Ernest Thayer in 1888.  While not personally highly interested in baseball this poem is so intertwined with American culture as to be representative of the journey that many may at some in their life's struggle be forced to make.  It also is a representative concept of the tenets of Don Quixote wherein a person struggles against those who continually place road blocks, perceived or not, in the path of one attempting to make the world better for the downtrodden.  In other words, one should continue to tilt their lance at windmills even if cultural norms indicate it is a futile cause.

   Recently I commented that next on the archival list of things to do was the continuation of the archival digitizing of History of Bedford, Somerset and Fulton Counties, Pennsylvania (Chicago, Ill.: Waterman, Watkins & Co., 1884).  The individual responding noted that this work, one of the first authoritative publications about the region, was available on both Ancestry.Com and Archive.Org.  They were almost correct and only erred in that the book can be found only at Ancestry.Com.  The publication to be found at Archive.Org is a later publication sans Fulton county, Pennsylvania.  That being said the first referenced work can be found online at two locations, Ancestry.Com and HeritagQuestOnline.Com.  These are both "paid" web sites.  Archive.Org is a free web site.

   The scope of this blog post shall be, in brief, the fallacy about such material being presented as archival in nature.  In lieu of going into the underlying technology; the differences shall be presented in a form that all should be able to readily comprehend — a picture is worth a thousand words.  The underlying technological differences are minor in nature with the major distinction between the two, archival and non-archival, being the quality of the equipment used and the post-acquisition work performed.  Since Archive.Org does not have the first historical work, but a comparison being required, the later sans Fulton county historical volume will be utilized for discussion purposes.  Finally, as an apples to apples and oranges to oranges comparison an archival and non-archival digital workflow shall be demonstrated.

   Originally this posting was going to include images from both the Ancestry and Heritage Quest Online publications.  However, it immediately became apparent that the works were one and the same.  The only difference being in the delivery.  Ancestry.Com does not allow its users to download a group of pages (only single pages) while HeritgeQuestOnline.Com allows the user to download a range of pages limited only by the speed of the user's Internet connection.  If you use a dial-up connection you can download only, roughly, a megabyte worth of pages.  Usage of a fast Internet connection allows a file download size of about four megabytes.  Neither hosting company allows for their files to be downloaded as a searchable PDF.  The end-user must create this type of file on their own.

Ancestry / Heritage Quest imageBrethren Archives image

   The image to the left is from Heritage Quest Online and is a two bit (black and white) scan.  The image to the right is a digital scan as created by the staff at BrethrenArchives.Com.  Both images have been scaled down to show side-by-side and the dimensions matched so that a true comparison can be made.  At the lower dimensional scale the differences are not readily apparent but do make the reading difficult.  It is only when they are viewed at their true scale, 8½" X 11", that the difference stands out.  To further illustrate the true differences and why the first should not be considered archival a PDF has been prepared which can be downloaded here.  The file will show images in a side-by-side format and is a sampling of pages with text and photographs.

   In the foregoing demonstration neither Ancestry.Com nor HeritageQuestOnline.Com have in any manner alluded to the works being archival in nature.  Such is not the case with Archive.Org.  The concept of Archive.Org is one of the best ever discussed since the advent of the Internet and digital scanners.  Unfortunately, at least to this correspondent, a mission statement with viable and understandable links to digitizing for the common man has yet to surface.  Neither does the group mention what standards are to be followed by those submitting material to be included in the archives.  In essence, the site is adhoc and in nature reminiscent of the Internet itself.  As is often the case, a group of individuals who believe they understand the methods have convinced others that they do and in the interim have been able to add more to their resumés.  A college degree recipient adding to his job description that he is a ditch digger without the knowledge of how to wield a shovel!

   What borders on the criminal is how the concept has been usurped by firms wishing to make a fast buck.  While misleading the more traditional archival facilities into believing that the services they offer are archival, the firms are facilitating the corruption of a heritage slowly being lost.  Again, a picture is worth a thousand words.

   Since Archive.Org is supposedly driven by librarians, many with a Masters or PhD degree, let us first discuss metadata as it pertains to a digital project.  Many of these concepts can be read in Technical Guidelines for Digitizing Archival Materials for Electronic Access (U. S. National Archives & Records Administration, 2004) by Steven Puglia, Jeffrey Reed and Erin Rhodes.  If the actual metadata guidelines are not specifically covered there are provided links within the document.

   The metadata provided for any material downloaded from Archive.Org can be easily found.  When you first visit the page to download your preferered book you are offered strings of text to the bottom of the screen.  Generally this is immediately below the "Reviews" section as "selected metadata."  It should not be "selected metadata" but the specifics for the volume presented.  For those not quite aware of what "metadata" is the easiest description is that it is the same thing as the old card catalogs we used to access a library's collection in the old days.  Nothing less, nothing more.  Again, a side-by-side comparison will be shown that will hopefully demonstrate the things that need repair at Archive.Org.  For demonstration purposes the selected book shall be Minutes of the Annual Meetings of the Brethren...

Copyright-evidence-operator: Steven F Radzikowski
Copyright-region: US
Copyright-evidence: Evidence reported by Steven F Redzikowski...
Copyrigt-evidence-date: 20080603145253
Scanningcenter: nj
Mediatype: texts
Identifier: minutesofannualm00germ
Ppi: 400
Camera: Canon 5D
Operator: scanner-medline-masson@...
Scandate: 20080605004313
Imagecount: 454
Identifer-ark: ark:/13960/t9s183001
Sponsordate: 20080630
Repository: Personal archives of A. Wayne Webb
Publication Date: 1917
Publication Title: Minutes of the Annual Meeting of the Brethren from 1778 to 1876; also, Supplemental Minutes from 1877 to 1917; and Appendix
record Group ID German Baptist Brethren Digital Archives
Record Group Desciptor: A book describing the articles and rulings of the various annual meetings of the German Baptist Brethren church to 1883.  Also includes minutes, to 1917, of the German Baptist, now Church of the Brethren, from 1883 to 1917.
Series: History book
Scanning Site: A. Wayne Webb, Millville, NJ
Source Format: Paper – 5.70" X 8.15" (approx)"
Scanner Operator: A. Wayne Webb
Scanner: Microtek 9800XL
Dynamic Range: 3.7 Dmax
File Format: TIFF
Color Mode: Adobe RGB (1998)
Spatial Resolution: 400ppi
Image Quality: 2 (no obvious visible defects)
Scale: 100%
Gamma Correction: Adobe RGB (1998)
Color Calibration: Adobe RGB (1998)
Compression: LZW
Pixel Array: 2270 X 3280 (varies)
Record Set Creation Date: Feb 5, 2011
Keywords: German Baptist Brethren, church, congregation, Church of the Brethren, annual meetings, minutes
Notes: Contains 481 images including book and metadata target.  Scanned at 400 pixels per inch color using the Adobe RGB (1998) color profile with no tonal adjustments and saved with no adjustments or scaling.  Scanner used: Microtek 9800XL with 3.7 dynamic range.
This collection created from a duplicate set of images with the red color channel adjusted using a shadow setting of 79 and highlight of 238, the green channel set at 84 and 234 and the blue channel at 68 and 216 respectively.  The scanner was calibrated  using a Kodak Q–60 Color Input Target IT8.7/2-1993 calibration target.
Rights Usage Terms: None.

   There is a vast difference between what is termed "archival" at Archive.Org and what is stressed through National Archives publications as it pertains to metadata, or card catalogs for us the older generation.  It is from guidelines suggested by the National Archives that the metadata structure used by BrethrenArchives.Com was derived.  Furthermore, this metadata structure is adhered to even when denoting the individual pages of each record set, be it one or two items or be it a 1,000 page reference work.  The only changes are in regard to the pixel array of that particular item and the inclusion of respective file sizes.  The fallacy of the non-adherence to archival metadata standards can not be solely placed with Archives.Org.  The only blame that can be laid on their doorstep is in not requiring the supplying institutions and / or firms to follow a comprehensive set of metadata guidelines.  And this from a group of librarians!!!

   We shall now indicate just what is termed archival and what actually exists online.  This criticism is yet again not solely the fault of what is in essence an online repository, but resides squarely in the laps of the supplying institutions.  And through them to the firms that they retain.  You can think of Archive.Org as a library having books donated to them that were created on paper.  Being the repository it should be their responsibility, and a necessary requirement, to ascertain the quality of their holdings.   This is obviously not being done and there is no structure in place to allow for this vetting process.  Shameful!

   Using the same Minutes of the Annual Meeting of the Brethren... the copy presently online at Archive.Org shall be compared to the edition held by Brethren Archives.Com.  I have taken the liberty of downloading that held by Archive.Org and removed all but enough pages to facilitate the comparison.  The two editions are slightly different in that their copy is the 1876 edition while the edition held by BrethrenArchives.Com is the newer 1917 edition.  It is the workmanship that is the comparative operative.

Archive.Org imageBrethrenArchives.Com image

   On the surface there does not appear to be much difference between the two images.  But a careful study of the perspective may show that for some unknown reason the text of the left image appears too close to the margins of the page.  The images have been severely cropped.  That is not archival!!!  Also, and just as important, the image is color distorted to where it seems that black text was laid atop a yellow piece of paper.  A term coming to the awareness of others for this style of digital archival is to call it "informational" scanning.  The information is there but out of context to the original  publication.  Buy a book, trim the not needed margins of the book and then run it through a scanner or camera.  Again, this is not archival and "informational" should not be a comparative to an "archival" digital project.  The storage of the original nor resultant images shall not be open to discussion at this time as the apparent goal of those supporting Archive.Org (to pad their resumè?) is the compiling of PDFs and not an archive.

   The real harm to "informational" scanning readily becomes apparent when the reference work as a whole is viewed.   The same set of pages from the two works are shown below and it is left to the comprehension of the reader to either agree or disagree with the statements made in this posting.  It is recommended that you scroll to where you can see pieces of both publications and scroll through page by page comparing them.  And then ask yourself this question; "What do you want to leave to your descendants and to history?"  You may also notice that the BrethrenArchives.Com edition has had the pages tonally adjusted so as to appear somewhat closer to what they may have originally looked like.  It is better than turning them into black and white images or even worse leaving them as yellowed husks of their former selves.  The photographs are where the differences really slap the reader in the face!



   So if you feel that Ancestry.Com, HeritageQuestOnline.Com or Archive.Org is satisfactory for establishing an archive that will likely still be around in the 23rd Century then this post has been all for naught.  But if you feel like I do (and others less vocal) that there has to be something better then voice your opinion!  Don't sit on the fence while it rots away beneath you!

   I am extremely disappointed that what I first envisioned as a digital project dedicated to the German Baptist Brethren church has devolved into what I have described above.  As with Archive.Org it was a beautiful thought that has become a muddied, bastardized shadow of its former self.  After being politely and pointedly shown the door by those who thought themselves better qualified to craft the project, I washed my hands of the whole lot.  To make matters worse some who know it was the wrong direction have thru apathy allowed it to progress down that path seen as detrimental.  Perhaps if this group of Brethren "historians" ever take the time to read and study the material that was provided to them, instead of relying on the used car salesman, they will come to understand where they went wrong.  Doubtful, but hope springs eternal!

A strong recommendation would be for this group of individuals to bring onboard an individual with strong mangerial skills who knows what the job should encapsulate and who is cognizant with the particulars of an archival skill set of the new era, i.e. digital archiving.  Not strong on theory, but strong in experience.  The ability to manage such a project from the perspective of a graphics editing program would be, obviously, a prime requisite.