Improving importing/exporting

Permalink 1 user found helpful
I'm wondering if there are any plans to improve the core importer and exporter classes? Having done some work in this area lately, page importing, startingpoints etc, I have found a few consistent problems in this area:

1) Importer really only works on an empty site. Which may be by design given AFAIK core only uses it during installation of a starting point package. For sites already built, this library is only useful if extended, and then still not that useful given it is sort of a "dumb importer" meaning it doesn't do any checking to see if objects being imported already exist, or check for dependencies. Bottom line if you use the importer functions on any live C5 site chances are there will be some type of error.

2) Exporter class seems to run into errors on live sites as well, though it seems to work fine on freshly installed sites. Generally the functions in exporter use object exports and those *should* be reliable, yet I'm seeing and hearing about a lot of situations where they fail. By fail I mean things like trying to use various functions with an object missing or not having a correct id, or something else. I suppose there are a myriad of reasons why objects would fail to export, the problem I feel is that again like with importer class, exporter class is dumb. It doesn't do any smart checking or testing, it just tries to export a thousand objects from a site and if any of them have problems it fails with an error.

3) Neither import/export currently seem to provide any help or methodology for "generic import/export" as in being able to provide a general framework for import/export of other objects such as those created by addon packages. I feel this is important because while we can never predict what other objects addons will create, we could provide addon developers with a generic set of methods they should use to make their data importable/exportable.

The reason I feel importing and exporting of data deserves focus is because while "good importing" of data or "easy migrating of sites" etc. may not be flashy features to convince somebody to use C5, but for developers on many projects having to do work in this area it can amount to a significant time sink. And most CMS systems have this problem so it's not unique to C5. Wordpress has limited data export capability. Drupal has spotty object exporting that is inconsistent at best and error prone at worst. This is actually an area where C5 can stand apart. WP and Drupal would find it almost impossible to get this really right... but C5 could get it right. The underlying infrastructure of C5 is *somewhat* supportive of import/export. But thought must be given mainly to how to handle failures on import/export.

So here is my goal in a nutshell, knowing we cannot easily achieve "perfect importing" let's aim for "error free export/import". A library that is *virtually* guaranteed to run error-free, and where any failures are detected and skipped. The result is if you export a C5 site, you might not get 100% of the data, but your export will succeed and there will be an error log showing all parts that failed. Same deal on import, import what you can, log failure on parts that can't. Bottom-line, it's not perfect but it works consistently. This allows developers to rely on it, and focus on how on adding on fixes for project-specific problems.

View Replies:
goldhat replied on at Permalink Reply
Here is an example of how exportability can be compromised in the current 5.6.2.1 that came up recently during PRB of a package Page Importer. The issue is that if you use an Image Attribute in a C5 page, then later delete that image file, the export of pages will fail because it will attempt to run functions on the non-existent file object.

There are basically 3 places where data integrity or data integrity checks fails to preserve export functions in this case:

1) File manager allows deletion of a file in use in an image attribute. The reference to the file is abandoned, now the page export refers to a file that does not exist. Even if this did not break exporting, it would *likely* break importing.

2) Core exporter page export function fails to check if files exist when exporting pages with image file attributes.

3) Export function in the AttributeImage controller fails to check if the file exists before performing operations on the file which may not exist leading to fatal error.

---

Let's break this down a little more.

In the core Exporter (concrete/core/libraries/content/exporter.php) content pages are exported inside function run() at line 25:

// now content pages
$pages = $this->x->addChild("pages");
$db = Loader::db();
$r = $db->Execute('select Pages.cID from Pages left join ComposerDrafts on Pages.cID = ComposerDrafts.cID where ComposerDrafts.cID is null and cIsTemplate = 0 and cFilename is null or cFilename = "" order by cID asc');
while($row = $r->FetchRow()) {
  $pc = Page::getByID($row['cID'], 'RECENT');
  $pc->export($pages);
}


In the code above the relevant part to this issue is that in the while loop $pc->export is the export() function in the core Pages Model (concrete/core/models/page.php) at line 682. The except where attributes for the page are exported is shown below.

$attribs = $this->getSetCollectionAttributes();
if (count($attribs) > 0) {
  $attributes = $p->addChild('attributes');
  foreach($attribs as $ak) {
    $av = $this->getAttributeValueObject($ak);
    $cnt = $ak->getController();
    $cnt->setAttributeValue($av);
    $akx = $attributes->addChild('attributekey');
    $akx->addAttribute('handle', $ak->getAttributeKeyHandle());
    $cnt->exportValue($akx);
  }
}


The key part of the code above is $ak->getController() gets the controller for the attribute which for basic attributes might be the main Attribute Controller, but for special attributes like Image Files it will be a specialized controller. In this case the controller for image files is concrete\core\models\attribute\types\image_file.php and it contains an adapted version of the exportValue() function that is called. This function is shown below.

public function exportValue($akn) {
  $av = $akn->addChild('value');
  $av->addChild('fID', ContentExporter::replaceFileWithPlaceHolder($this->getValue()->getFileID()));
}


Note the call to $this->getValue()->getFileID(). getValue() function attempts to load the image file record from the database. In this chained call, even if the image file is not in the database, the call to getFileID() happens leading to a fatal error.

---

In closing, should it be the job of the C5 user to always carefully check if their images are in use and try to avoid deleting them if they are? We know typical users won't always be so diligent and even if they try to be, mistakes can and will happen. Should the export class check for files before calling functions to try to export them?

Given this is just one example, and there are probably hundreds and maybe thousands of others, what is generally the best way to handle this type of issues? Thoughts, ideas?
JohntheFish replied on at Permalink Reply
JohntheFish
Chained calls are sloppily used as a convenience where there are edge cases that the chain will break in such a manner. Where such are found, I think they need to be reported as bugs and ideally a fix added on github.

The key decision to make in any fix is what to do if the object doesn't exist.

a) Just skip the problem code and return null or a default (what will the consequence be?).

b) Report an error (then what?).

c) Throw an exception, which may fail later anyway if there is no exception handler (and possibly what the original programmer thought when chaining the calls, so didn't bother).

d) If the object doesn't exist, then everything will fail anyway sooner or later, so may as well let it fail now when we know what the first instance is (another possibility for the original programmer's thought process).