Exporting many form results to CSV - Before I File a Bug Report

Permalink
V8.5.0

We have a form with 2500 results which refuses to export fully unless I up the PHP execution time (fair enough sometimes) and the memory limit to 1GB which is definitely NOT fair enough. This is stock C5 forms and dashboard export from reports.

My assumption is that the code returning the form results is loading the entire result set into memory rather than streaming it from a DB buffer out to CSV in a loop which should be able to deal with a very large number of results without hammering server RAM.

If this is the case then that is extremely undesirable behaviour, especially if as few as 2500 records needs the PHP memory limit upping from an already robust 256Mb.

I wanted to check, before submitting this as a bug, that the very latest version of C5 doesn't have this already addressed.

This is a site dealing with COVID-19 regional support and having the reports fall over like this is unacceptable. Likewise we don't have infinite server RAM. At this rate we'll be out of RAM again in 6 days, I only upped it from 128Mb to 256Mb yesterday.

Normally we'd build out our own DB tables for stuff like this but obviously a rapid build was required so we're not using our own DB routines which definitely would stream results row-by-row to CSV tmp file on disk before sending to the client rather than attempting to load them en masse.

surefyre
View Replies:
MrKDilkington replied on at Permalink Reply
MrKDilkington
Hi surefyre,

It appears that the CSV export for form results is a streamed response.
https://github.com/concrete5/concrete5/blob/develop/concrete/control...

Is there a lot of information in each result?
surefyre replied on at Permalink Reply
surefyre
Not at all, about 23 fields to the form.

I suspect that somewhere down the rabbit hole of streamedResponse, which I just looked at but don't have time to wade through, there's a fetchAll() happening on the DB to retrieve the Express entries for the form results. This appears to be the 'default' behaviour in C5 for many things where I've dealt with DB stuff that isn't of my own making.

If it were truly streamed in the sense of row-by-row from the query result then there's no way I'd need to alter the php memory limit from 128Mb to past double that as these form entries grew in number.

I'm sure streamedResponse does what it says on the tin but elsewhere the data to be streamed is being loaded in one huge chunk.
JohntheFish replied on at Permalink Reply
JohntheFish
Looking at the code there is no fetchAll, but as you observe I also am not convinced it is actually streamed row by row, just assembled row by row before being streamed. Within that you have the overhead of, for each row, every express column being loaded is another attribute object involving key and value objects and multiple queries.

Its a classic example of obsessive object fandom obfuscating and retarding what should have been a few lines of code in a simple while loop zipping through a single table.
surefyre replied on at Permalink Reply
surefyre
Hi John, yes so IMHO that would be a bug, something we'd definitely stick on a snagging list here if a dev did that.

Should it be rasied do you feel? It's the kind of thing that makes C5 unsuitable for sites that have a lot of contact or support form submissions, which must be quite a number I'd have thought. Not all C5 sites will be low 'hobby' traffic.

Certainly feels like a bug to me.
JohntheFish replied on at Permalink Reply
JohntheFish
I think there are actually 2 issues that compound each other.
1. The immediate issue of inefficient csv creation
2. That Express as a concept is fundamentally unsuited to complex data or large data sets

(2) Is compounded because new features for more complexity keep on being added to express, consequently encouraging sites to expand their use of it when they should really be looking at developing dedicated tables.

You may get somewhere filing a bug report on github for (1). But there is too much intellectual baggage invested to get anywhere with (2).
surefyre replied on at Permalink Reply
surefyre
Thanks, John. I filed a bug report anyway as it's obviously a serious shortcoming.

If it gets any attention, well that's another matter!

A serious aspect is that it fails silently if it runs out of memory so people will unwittingly be getting incomplete reports out of the dash.