![]() ![]() Each page of the PDF file has a metadata record after it is added to a collection, but the digital item associated with it in CONTENTdm is virtual (i.e., a link to the related page in the PDF file). The page order of the PDF compound object matches the page order of the original, multiple-page PDF file. PDF compound objects (of the type monograph) are automatically created when multiple-page PDF files are added and approved to a collection, if that collection has been configured to enable PDF conversion or if you have configured the Processing settings in the Project Settings Manager. Thumbnail images can be automatically generated for PDF files based on the first page of the PDF, or you can specify a custom thumbnail. If the text file contains the text, then the PDF has embedded text.) ![]() (To check whether your PDF file has embedded text, save it as a. The full text search field is empty when the item is added to the collection.The collection that the PDF is being added to has a full text search field.(The automatic text extraction for PDF files mentioned above is separate functionality and does not require the OCR Extension.) Using the OCR Extension, full text can be generated from JPEG2000, JPEG, PNG, GIF and TIFF files. If your PDF file was created from scanned TIFF images, it does not have embedded text unless you have taken the additional step to OCR the image (or PDF file) and add that text to the PDF.ĬONTENTdm supports integrated OCR functionality through the OCR Extension. Note: If your PDF file was created from a born-digital document, such as a Microsoft Word file, it will almost always have embedded text. These larger files can be saved to the desktop or opened outside of the browser.Īnother way that PDF files are different from other files is that the text from PDF files is extracted and placed in a full text search field when PDF files are approved and added to a collection. Note: To ensure an optimal end-user experience, PDF files (or pages of a compound object) larger than 20 MB are not loaded inline in any of the item viewers. ![]() You can import multiple PDF files using the Add multiple items wizard.ĭepending on how your collection is configured, multiple-page PDF files can be added to your collection to be viewed as single items or, if PDF conversion is enabled, they can be automatically converted to PDF compound objects. Regardless of the number of pages, it is a single file and is uploaded as a single file. If it is not already installed, install Adobe Reader.Ī single PDF file can contain many pages. To view PDF files and PDF compound objects in the Project Client, you must have Adobe Reader.For a better end-user experience, you can use CONTENTdm to create JPEG2000 or JPEG display images from scanned TIFF files, rather than converting the TIFF files to PDF files. PDF files created from images can be very large and slow to download for online viewing. For scanned images, you can use the CONTENTdm OCR Extension for generating full text. Additionally, PDF is not ideal for scanned images because an item that has been scanned does not automatically contain embedded text.PDF files are not efficient nor provide an optimal end-user experience for scanned images, books, maps or newspapers. For example, PDF files are ideal for documents that were initially created as digital documentation, such as theses and city council minutes. Before you decide to use PDF over another format, consider whether your source materials are well-suited to this format, and whether your end-user experience would be optimized by using PDF.Additionally, pages of a compound object automatically generated from a PDF file will not count toward the total number of items on the server. The PDF features include: automatic conversion of multiple-page PDF files into compound objects, creation of thumbnail images from PDF files, and full text extraction.PDF files and PDF compound objects can be displayed inline in the Item Viewer and Compound Object Viewer by using Adobe Reader®. CONTENTdm provides features for efficient processing of born-digital documents in PDF format.Portable Document Format (PDF) is a format created by Adobe® and used for electronic document distribution and exchange. Find information about PDF files in CONTENTdm and learn how to import single-item and multiple-page PDF files in the Project Client. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |