Why progress estimates are difficult
Feb. 27th, 2019 06:03 pmSomeone at $JOB said that they really wished that rsync could give a fairly close estimate of how long a given operation would take to complete. I had to jump in...
Be careful what you wish for.
Especially that "close" in there, which is a disastrous request!
AIUI...
It can't do that, because the way it works is comparing files on source and destination block-by-block to work out if they need to be synched or not.
To give an estimate, it would have to do that twice, and thus, its use would be pointless. Rsync is not a clever copy program. Rsync exists to synch 2 files/groups of files without transmitting all the data they contain over a slow link; to do the estimate you ask would obviate its raison d'être.
If it just looked at file sizes, the estimate would be wildly pessimistic, and thus make the tool far less attractive and that would have led to it not being used and becoming a success.
Secondly, by comparison: people clearly asked for this from the Windows developers, and commercial s/w being what it is, they got it.
That's how on Win10 you get a progress bar for all file operations. Which means deleting a 0-byte file takes as long as deleting a 1-gigabyte file: it has to simulate the action first, in order to show the progress, so everything now has a built-in multi-second-long delay (far longer than the actual operation) so it can display a fancy animated progress bar and draw a little graph, and nothing happens instantly, not even the tiniest operations.
Thus a harmless-sounding UI request completely obviated the hard work that went into optimising NTFS, which for instance stores tiny files inside the file system indices so they take no disk sectors at all, meaning less head movement too.
All wasted because of a UI change.
Better to have no estimate than a wildly inaccurate estimate or an estimate that doubles the length of the task.
Yes, some other tools do give a min/max time estimate.
There are indeed far more technically-complex solutions, like...
(I started to do this in pseudocode but I quickly ran out of width, which tells you something)
* start doing the operation, but also time it
* if the time is more than (given interval)
* display a bogus progress indicator, while you work out an estimate
* then start displaying the real progress indicator
* while continuing the operation, which means your estimate is now
inaccurate
* adjust the estimate to improve its accuracy
* until the operation is complete
* show the progress bar hitting the end
* which means you've now added a delay at the end
So you get a progress meter throughout which only shows for longer operations, but it delays the whole job.
This is what Windows Vista did, and it was a pain.
And as we all know, for any such truism, there is an XKCD for it.
https://xkcd.com/612/
That was annoying. So in Win10 someone said "fix it". Result, it now takes a long time to do anything at all, but there's a nice progress bar to look at.
So, yeah, no. If you want a tool that does its job efficiently and as quickly as possible, no, don't try to put a time estimate in it.
Non-time-based, non-proportional time indicators are fine.
E.g. "processed file XXX" which increments, or "processed XXX $units_of_storage"
But they don't tell you how long it will take, and that annoys people. They ask "if you can tell me how much you've done, can't you tell me what fraction of the whole that is?" Well, no, not without doing a potentially big operation before beginning work which makes the whole job bigger.
And the point of rsync is that it speeds up work over slow links.
Summary:
Estimates are hard. Close estimates are very hard. Making the estimate makes the job take much longer (generally, at a MINIMUM twice as long). Poor estimates are very annoying.
So, don't ask for them.
TL;DR Executive summary (which nobody at Microsoft was brave enough to do):
"No."
This was one of those things that for a long time I just assumed everyone knew... then it has become apparent in the last ~dozen years (since Vista) that apparently lots of people didn't know, and indeed, that this lack of knowledge was percolating up the chain.
The time it hit me personally was upgrading a customer's installation of MS Office XP to SR1. This was so big, for the time -- several hundred megabytes, zipped, in 2002 and thus before many people had broadband -- that optionally you could request it on CD.
He did.
The CD contained a self-extracting Zip that extracted into the current directory. So you couldn't run it directly from the CD. It was necessary to copy it to the hard disk, temporarily wasting ¼ GB or so, then run it.
The uncompressed files would have fitted on the CD. That was a warning sign; several people failed in attention to detail and checks.
(Think this doesn't matter? The tutorial for Docker instructs you to install a compiler, then build a copy of MongoDB (IIRC) from source. It leaves the compiler and the sources in the resulting container. This is the exact same sort of lack of attention to detail. Deploying that container would waste a gigabyte or so per instance, and thus waste space, energy, machine time, and cause over-spend on cloud resources.
All because some people just didn't think. They didn't do their job well enough.
So, I copied the self-extractor, I ran it, and I started the installation.
A progress bar slowly crept up to 100%. It took about 5-10 minutes. The client and I watched.
When it got to 100%... it went straight back to zero and started again.
This is my point: progress bars are actually quite difficult.
It did this seven times.
The installation of a service release took about 45 minutes, three-quarters of an hour, plus the 10 minutes wasted because an idiot put a completely unnecessary download-only self-extracting archive onto optical media.
The client paid his bill, but unhappily, because he'd watched me wasting a lot of expensive time because Microsoft was incompetent at:
[1] Packaging a service pack properly.
[2] Putting it onto read-only media properly.
[3] Displaying a progress bar properly.
Of course it would have been much easier and simpler to just distribute a fresh copy of Office, but that would have made piracy easier than this product is proprietary software and one of Microsoft's main revenue-earners, so it's understandable that they didn't want to do that.
But if the installer had just said:
Installation stage x/7:
Progress: [XXXXXXXXXX..........]
That would have been fine. But it didn't. It went from 0 to 100%, seven times over, probably because first the Word team's patch was installed, then the Excel team's patch, then the Powerpoint team's patch, then the Outlook team's patch, then the Access team's patch, then the file import/export filters team's patch, etc. etc.
Poor management. Poor attention to detail. Lack of thought. Lack of planning. Major lack of integration and overview.
But this was just a service release. Those are unplanned; if the apps had been developed and tested better, in a language immune to buffer overflows and which didn't permit pointer arithmetic and so on, it would have have been necessary.
But the Windows Vista copy dialog box, as parodied in XKCD -- that's taking orders from poorly-trained management who don't understand the issues, because someone didn't think it through or explain it, or because someone got promoted to a level they were incompetent for.
https://en.wikipedia.org/wiki/Peter_principle
These are systemic problems. Good high-level management can prevent them. Open communications, where someone junior can point out issues to someone senior without fear of being disciplined or dismissed, can help.
But many companies lack this. I don't know yet if $DAYJOB has sorted these issues. I can confirm from bitter personal experience that my previous FOSS-centric employer suffered badly from them.
Of course, some kind of approximate estimate, or incremental progress indicator for each step, is better than nothing.
Another answer is to concede that the problem is hard, and display a "throbber" instead: show an animated widget that shows something is happening, but not how far along it is. That's what the Microsoft apps team often does now.
Personally, I hate it. It's better than nothing but it conveys no useful information.
Doing an accurate estimator based on integral speed tests is also significantly tricky and can slow down the whole operation. Me personally, I'd prefer an indicator that says "stage 6 of 15, copying file 475 of 13,615."
I may not know which files are big or small, which stages will be quick or slow... but I can see what it's doing, I can make an approximate estimate in my head, and if it's inaccurate, well, I can blame myself and not the developer.
And nobody has to try to work out what percent of an n stage process with o files of p different sizes they're at. That's hard for someone to work out, and it's possible that someone can't tell them a correct number of files or something... so you can get progress bars that go to 87% and then suddenly end, or that go to 106%, or that go to 42% and then sit there for an hour, and then do the rest in 2 seconds.
I'm sure we've all seen all of those. I certainly have.
Be careful what you wish for.
Especially that "close" in there, which is a disastrous request!
AIUI...
It can't do that, because the way it works is comparing files on source and destination block-by-block to work out if they need to be synched or not.
To give an estimate, it would have to do that twice, and thus, its use would be pointless. Rsync is not a clever copy program. Rsync exists to synch 2 files/groups of files without transmitting all the data they contain over a slow link; to do the estimate you ask would obviate its raison d'être.
If it just looked at file sizes, the estimate would be wildly pessimistic, and thus make the tool far less attractive and that would have led to it not being used and becoming a success.
Secondly, by comparison: people clearly asked for this from the Windows developers, and commercial s/w being what it is, they got it.
That's how on Win10 you get a progress bar for all file operations. Which means deleting a 0-byte file takes as long as deleting a 1-gigabyte file: it has to simulate the action first, in order to show the progress, so everything now has a built-in multi-second-long delay (far longer than the actual operation) so it can display a fancy animated progress bar and draw a little graph, and nothing happens instantly, not even the tiniest operations.
Thus a harmless-sounding UI request completely obviated the hard work that went into optimising NTFS, which for instance stores tiny files inside the file system indices so they take no disk sectors at all, meaning less head movement too.
All wasted because of a UI change.
Better to have no estimate than a wildly inaccurate estimate or an estimate that doubles the length of the task.
Yes, some other tools do give a min/max time estimate.
There are indeed far more technically-complex solutions, like...
(I started to do this in pseudocode but I quickly ran out of width, which tells you something)
* start doing the operation, but also time it
* if the time is more than (given interval)
* display a bogus progress indicator, while you work out an estimate
* then start displaying the real progress indicator
* while continuing the operation, which means your estimate is now
inaccurate
* adjust the estimate to improve its accuracy
* until the operation is complete
* show the progress bar hitting the end
* which means you've now added a delay at the end
So you get a progress meter throughout which only shows for longer operations, but it delays the whole job.
This is what Windows Vista did, and it was a pain.
And as we all know, for any such truism, there is an XKCD for it.
https://xkcd.com/612/
That was annoying. So in Win10 someone said "fix it". Result, it now takes a long time to do anything at all, but there's a nice progress bar to look at.
So, yeah, no. If you want a tool that does its job efficiently and as quickly as possible, no, don't try to put a time estimate in it.
Non-time-based, non-proportional time indicators are fine.
E.g. "processed file XXX" which increments, or "processed XXX $units_of_storage"
But they don't tell you how long it will take, and that annoys people. They ask "if you can tell me how much you've done, can't you tell me what fraction of the whole that is?" Well, no, not without doing a potentially big operation before beginning work which makes the whole job bigger.
And the point of rsync is that it speeds up work over slow links.
Summary:
Estimates are hard. Close estimates are very hard. Making the estimate makes the job take much longer (generally, at a MINIMUM twice as long). Poor estimates are very annoying.
So, don't ask for them.
TL;DR Executive summary (which nobody at Microsoft was brave enough to do):
"No."
This was one of those things that for a long time I just assumed everyone knew... then it has become apparent in the last ~dozen years (since Vista) that apparently lots of people didn't know, and indeed, that this lack of knowledge was percolating up the chain.
The time it hit me personally was upgrading a customer's installation of MS Office XP to SR1. This was so big, for the time -- several hundred megabytes, zipped, in 2002 and thus before many people had broadband -- that optionally you could request it on CD.
He did.
The CD contained a self-extracting Zip that extracted into the current directory. So you couldn't run it directly from the CD. It was necessary to copy it to the hard disk, temporarily wasting ¼ GB or so, then run it.
The uncompressed files would have fitted on the CD. That was a warning sign; several people failed in attention to detail and checks.
(Think this doesn't matter? The tutorial for Docker instructs you to install a compiler, then build a copy of MongoDB (IIRC) from source. It leaves the compiler and the sources in the resulting container. This is the exact same sort of lack of attention to detail. Deploying that container would waste a gigabyte or so per instance, and thus waste space, energy, machine time, and cause over-spend on cloud resources.
All because some people just didn't think. They didn't do their job well enough.
So, I copied the self-extractor, I ran it, and I started the installation.
A progress bar slowly crept up to 100%. It took about 5-10 minutes. The client and I watched.
When it got to 100%... it went straight back to zero and started again.
This is my point: progress bars are actually quite difficult.
It did this seven times.
The installation of a service release took about 45 minutes, three-quarters of an hour, plus the 10 minutes wasted because an idiot put a completely unnecessary download-only self-extracting archive onto optical media.
The client paid his bill, but unhappily, because he'd watched me wasting a lot of expensive time because Microsoft was incompetent at:
[1] Packaging a service pack properly.
[2] Putting it onto read-only media properly.
[3] Displaying a progress bar properly.
Of course it would have been much easier and simpler to just distribute a fresh copy of Office, but that would have made piracy easier than this product is proprietary software and one of Microsoft's main revenue-earners, so it's understandable that they didn't want to do that.
But if the installer had just said:
Installation stage x/7:
Progress: [XXXXXXXXXX..........]
That would have been fine. But it didn't. It went from 0 to 100%, seven times over, probably because first the Word team's patch was installed, then the Excel team's patch, then the Powerpoint team's patch, then the Outlook team's patch, then the Access team's patch, then the file import/export filters team's patch, etc. etc.
Poor management. Poor attention to detail. Lack of thought. Lack of planning. Major lack of integration and overview.
But this was just a service release. Those are unplanned; if the apps had been developed and tested better, in a language immune to buffer overflows and which didn't permit pointer arithmetic and so on, it would have have been necessary.
But the Windows Vista copy dialog box, as parodied in XKCD -- that's taking orders from poorly-trained management who don't understand the issues, because someone didn't think it through or explain it, or because someone got promoted to a level they were incompetent for.
https://en.wikipedia.org/wiki/Peter_principle
These are systemic problems. Good high-level management can prevent them. Open communications, where someone junior can point out issues to someone senior without fear of being disciplined or dismissed, can help.
But many companies lack this. I don't know yet if $DAYJOB has sorted these issues. I can confirm from bitter personal experience that my previous FOSS-centric employer suffered badly from them.
Of course, some kind of approximate estimate, or incremental progress indicator for each step, is better than nothing.
Another answer is to concede that the problem is hard, and display a "throbber" instead: show an animated widget that shows something is happening, but not how far along it is. That's what the Microsoft apps team often does now.
Personally, I hate it. It's better than nothing but it conveys no useful information.
Doing an accurate estimator based on integral speed tests is also significantly tricky and can slow down the whole operation. Me personally, I'd prefer an indicator that says "stage 6 of 15, copying file 475 of 13,615."
I may not know which files are big or small, which stages will be quick or slow... but I can see what it's doing, I can make an approximate estimate in my head, and if it's inaccurate, well, I can blame myself and not the developer.
And nobody has to try to work out what percent of an n stage process with o files of p different sizes they're at. That's hard for someone to work out, and it's possible that someone can't tell them a correct number of files or something... so you can get progress bars that go to 87% and then suddenly end, or that go to 106%, or that go to 42% and then sit there for an hour, and then do the rest in 2 seconds.
I'm sure we've all seen all of those. I certainly have.