Problems with importing many Express Objects (10k+)

Permalink
Currently I am developing a project based on concrete5. This project is playing with a lot of data. There are about 20 tables / entities.
Some of these are very small and have only a few entries. However, there are three mega tables with a huge number of entries. The „booking times“ table e.g. contains over 5 million entries.
In General I see no problem. Once the data is stored in the table, Doctrine can handle it pretty well. (Succesfully tested with 20,000 entries; Backend view was fluid)

But I have many troubles with importing the data:
When I import the data in the traditional way as described in the developer's documentation with Express:buildEntry(), I am getting out-of-time-out issues.

Personally, I see two solution approaches here:

1) Generate MySQL Insert String's that add multiple entries at once.

INSERT INTO myTable (column1, column2…) VALUES (.., ..), (.., ..), (.., ..), (.., ..), (.., ..), (.., ..)……


Here I have already created a proof on Concept Code for this. The performance is definitely much better. Benchmark on my machine: 10,000 entries in 2.5 seconds.

The problem is that, so many tables are included for creating just a single Express Entry and all depending on attributes and associations.

Of course I could spend a long time and implement a mega method that would handle this, but
I could imagine when I am finish that the performance will be just as bad as the object-oriented way because of the the much queries and conditions.

Other points that are against this method: This way would be vulnerable to multibyte SQL injections and would only works with MySQL driver.

So I'm not happy with this method I think there are better solutions.

2) Doctrine Bulk Processing

http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/re...

Has anybody some experience with that? I do not know if it is possible, but I could imagine that it possible to subclass e.g. the EntityManager class and temporarily disable the flush() method and then trigger this method in 50s-steps manually. What do you think of this approach?

Or maybe someone of your have even a better/other approach? I would really like to implement the project with Express because it simply brings many benefits.

fabianbitter
View Replies:
fabianbitter replied on at Permalink Reply
fabianbitter
Here is my normal import method (the default way).

private function testBigDataImport()
    {
        $pkg = Package::getByHandle($this->pkgHandle);
        $csvFile = $pkg->getPackagePath() . "/content_files/import_data/default_location_indicators.csv";
        $csv = new \parseCSV();
        $csv->delimiter = ",";
        $csv->parse($csvFile);
        $start = round(microtime(true) * 1000);
        $i = 0;
        $testTotal = 10;
        foreach ($csv->data as $row) {
            // @todo: solve problem with handling big data
            $entry = Express::buildEntry('location_indicator')
                ->setRecordType($row["record_type"])
                ->setTextIndicator($row["text_indicator"])


And this are the results:

Total Entries:   16169
Avg. Time:       757.9ms / Entry
Est. Total Time: 12254.49s
MrKDilkington replied on at Permalink Reply
MrKDilkington
Hi fabianbitter,

What you describe sounds useful.

I recommend creating a GitHub discussion issue for this.
fabianbitter replied on at Permalink Reply
fabianbitter
Good idea. I will create a thread there.
fabianbitter replied on at Permalink Reply
fabianbitter
@all: I have developed a composer package for fast importing data records into concrete5 Express Objets. This package reduces the import time per item from 750 ms up to 1 ms and less.

But after my Express evaluation i decided to use native Doctrine for my Project instead of Doctrine. The general performance is just too weak for big data projects. My result after 2 weeks of Express: For small projects + prototypes Express is very cool and useful but for big projects it is better to work with Doctrine / Native Sql.

Anyway. My import package is available under:
https://bitbucket.org/fabianbitter/concrete5_express_batch_importer...

Cheers :)