Firing custom event/file inspection on PDF upload

Permalink
Hello all

I'm trying to emulate the functionality which the core has for populating the width and height file attributes when an image is uploaded, but instead to do this for PDFs with a different attribute type. I can't find how to make the file inspection 'fire' on upload, however.

I copied /concrete/core/libraries/image_file_type_inspector.php to /libraries/pdf_file_type_inspector.php and changed the code to this:

class Concrete5_Library_PdfFileTypeInspector extends FileTypeInspector {
   public function inspect($fv) {
      // sets an attribute
      $att = FileAttributeKey::getByHandle('my_attribute');
      $fv->setAttribute($att, 'test');
   }
}


In the hope I might be able to make something happen when a PDF is uploaded. The code which looks for custom file type handler files is in /concrete/core/libraries/file/type.php:

public function getCustomInspector() {
      $script = 'file/types/' . $this->getCustomImporter();
      if ($this->pkgHandle != false) {
         Loader::library($script, $this->pkgHandle);
      } else {
         Loader::library($script);
      }
      $class = Object::camelcase($this->getCustomImporter()) . 'FileTypeInspector';
      $cl = new $class;
      return $cl;
   }


I'm not sure when this is fired though -- it seems to be only with custom packages, but I'm probably wrong.

Does anyone know how to fire off a custom file handler when the relevant file type is uploaded?

Any help appreciated :)

melat0nin
 
jshannon replied on at Permalink Best Answer Reply
jshannon
Hi. I did this a year or two ago. It was a pretty cool solution. I even invoked imagemagick to create a thumbnail of the PDFs.

I can't remember exactly what I did, so I might be forgetting an entire file. Also note that I did this for 5.4, though I'd be suprised if these APIs have been touched:

1. You need to "register" the file type handler, much like you'd do with an event. This is in my package's on_start(), but could be in site_events or something:

FileTypeList::getInstance()->define('pdf', t('PDF'), FileType::T_DOCUMENT, 'ampPdf', false, false, 'package_handle');


2. Then in package_handle/libraries/file/types/ampPdf.php :

Like the rest of c5, you need to be careful about capitalization, class names, etc....

class AmpPdfFileTypeInspector extends FileTypeInspector {
    public function inspect($fv) {
        $page = 1; //not sure if it'll be possible to do multiple page PDFs in this way
        $path = $fv->getPath();
        $page = $page - 1;
      if (@is_executable("/usr/local/bin/identify")) {
         $bin = '/usr/local/bin';
      } else {
         $bin = '/usr/bin';
      }
        if (! @is_executable("$bin/identify") || ! @is_executable("$bin/convert") || ! @is_executable("$bin/gs")) {
            throw new Exception("PDF Inspector->inspect(): identify and convert are necessary");
        }
        putenv("PATH=$bin:" . getenv('PATH'));
        //get PDF info
melat0nin replied on at Permalink Reply
melat0nin
Thanks for the reply. I presume your approach requires a package installation?

I managed to achieve it using the on_file_add event. In site_events.php I put this:

Events::extend('on_file_add', 'PdfScan', 'checkForPDF', 'libraries/import_pdf.php');


and in /librarires/import_pdf.php I have this:

Class PdfScan extends Object {
    public function checkForPDF($f, $fv) {
       $fh = Loader::helper('file');
       $ext = $fh->getExtension($fv->getFileName());
       if ($ext === 'pdf') {
                    // do something
       }
    }
}


That does the trick!
jshannon replied on at Permalink Reply
jshannon
My example is within a package, but c5 is typically pretty good about allowing you to do things without packages in the root (themes, model overrides, etc).

Personally, I can't think of any pros vs cons of events vs the file handler. I guess the file handler is just a more complex concept of events. I'd look forward to anybody else's thoughts on this.

You might also want to tie into on_file_version_add, too (or instead of). That will cover the edge case where somebody "update"s the PDF, in a way that changes the data that you're saving.
melat0nin replied on at Permalink Reply
melat0nin
Yeah it seems a bit more 'API' to use the File Handler than a simple comparison of extensions, but I agree there doesn't seem to be much difference if the triggering is this simple.

Thanks for the heads up about on_file_version_add -- makes perfect sense!
jshannon replied on at Permalink Reply
jshannon
Oh. Something just occured to me as I was working on a similar solution.

The file handler will get called on "rescan" (it's a button in the properties dialog, and possibly called from elsewhere) whereas an "on_add" event obviously won't.
jshannon replied on at Permalink Reply
jshannon
Also, the event doesn't fire on "replace"... so you're kinda SOL if a new file version is added.
melat0nin replied on at Permalink Reply
melat0nin
The on_file_version_add event seems a bit odd -- it doesn't appear to pass a file version ($fv) through to the event method, whereas on_file_add does.

I can get the script to fire on_file_version_add but it's not actually saving any data to the file's attribute, unlike when the file is first uploaded.

I've got one event (on_file_version_add is fired both when a file is first uploaded and when it's replaced):

Events::extend('on_file_version_add', 'DocumentScan', 'scanDocument', 'libraries/parse_document_text.php');


and in my library I've got this:

Class DocumentScan extends Object {
    public function scanDocument($f, $fv = null) {
      $string_rep = FileAttributeKey::getByHandle('raw_text');
      $fh = Loader::helper('file');
      $fileObj = $f;
      if ($fv != null)
         $fileObj = $fv;
      $ext = strtolower( $fh->getExtension($fileObj->getFileName()) );
      $file = $fileObj->getPath();
      $scan_exts = array('doc','docx','rtf','txt');
      $text = '';
      if ( $ext == 'pdf' ) {
         $text = shell_exec("pdftotext {$file} -");
      } elseif ( in_array($ext, $scan_exts) ) {
         $text = shell_exec("unoconv --stdout -f txt {$file}");


You can see I'm just passing the file object, rather than file version object, when $fv is null (which happens when on_file_version_add is fired). The problem is obviously there, because I need to change the attribute value on the new file version object, but I don't seem to get a hold of it!

Any thoughts?
jshannon replied on at Permalink Reply
jshannon
Yeah. Use the FIleTypes...

I recently (since my earlier posts) had to do something with new files. I found events SUPER buggy. It's hard to remember, but basically the object but not all information is populated (even though the code makes it look like it would be). If I recall, the solution was to get the FileID, then get an entirely new object based on that ID (so, clearly, the info had been created and saved by the time the event was called).

No idea what was wrong, though I recently started working on a project and found the client had already done a similar work around, so not just me...

Try that. Or, rather, just use the FileType.
melat0nin replied on at Permalink Reply
melat0nin
Nice one, it works really well!

I put the FileTypeList definitions in site_events (not sure if that's the best place, but it keeps them within the bootstrapping of the site which makes some sense I think) then amended my method slightly and put it in /libraries/files/types.

Replacing docs works to rescan and update attributes and everything seems dandy.

Cheers!