MinAveRange exercise

Hi all,

It's about a task on Codility saying summarily:
Find the range (with at least two elements) in an array A (with the following specs) that contains the minimal average of the numbers and return the index of the first element of that range:
Array A contains at least 2 numbers. The numbers are int in range [-10'000, +10'000]. For instance for A {5, 2, 2, 7, 1, 8} the return value should be 1 because (A[1] + A[2])/2 is the minimum range.


I wrote the following simple, correct but inefficient alog, O(n^2).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int solution(std::vector<int>& A) {

	int index{ 0 };
	double min{ (A[0] + A[1]) / (double)2 };

	for (int i = 0; i < A.size(); ++i) {
		double r = A[i];
		for (int j = i + 1; j < A.size(); ++j) {
			r += A[j];
			double temp = r / (j - i + 1);
			if (temp < min) {
				min = temp;
				index = i;
			}
		}
	}
	return index;
}


I thought of it and the prior algorithms I had in mind for hours but couldn't find a more efficient algorithm for the task and what I need is an "idea". What is/are your idea(s), please?
Last edited on
Can you construct an example where a range with 3 (or more) elements would actually have a smaller average than the best (smallest average) 2-element range?
Last edited on
If you have a sequence 1 2 1 then the average of 1 2 (or 2 1) is 1.5 but the average of 1 2 1 is 1.33 - which is < the 2 elem average.
You are right.

Then I think it is necessary to check every possible sequence-length for every possible start index.

This comes down to O(n^2).
Last edited on
There should certainly be a better solution. And yes, as seeplus pointed out, a larger series of numbers may have smaller average.
Well, you can limit the maximum length of the range to be tested to a fixed value, which makes the "inner" loop fixed time, because number of iterations then will not depend on n. Overall, you'd get a complexity of O(n).

...but then you might miss an even better result, that might be produced by an even longer range 😕

If you want to be sure to always find the "best" possible result, I don't see a way to break the "inner" loop early, since adding even more elements may always (at least potentially) improve the current result.
Last edited on
Here's a O(n) time, O(1) space solution (I think it's a solution):

In case you didn't just want code, the idea is to notice how the the solution for the entire array depends on the solution to a smaller part of the array.

Consider a simpler problem which asks you to compute the minimum average directly. So for example the answer to minavg({5, 2, 2, 7, 1, 8}) would be 2 because the minimum average is (2+2) / 2.

Then think about it from a bottom-up perspective, trying to figure out how the base case minavg({5, 2}) can be used to express the slightly larger problem minavg({5, 2, 2}). Essentially, try to notice that minavg({5, 2, 2}) is the minimum of minavg({5, 2}) = (5+2) / 2 = 3.5 and minavg({2, 2}) = 2. That's the idea of the solution below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <vector>
#include <iostream>

std::vector xs{ 5, 2, 2, 7, 1, 8 };
  
int main()
{
  struct pq { int p, q; double avg() const { return static_cast<double>(p) / q; } };   

  pq min_current { xs[0] + xs[1], 2 }; 
  pq min_total   { xs[0] + xs[1], 2 };
  int end_index = 2;
  
  for (int i = 2; i < xs.size(); ++i)
  {
    pq const a { min_current.p + xs[i], min_current.q + 1 };
    pq const b { xs[i - 1] + xs[i], 2 };
    min_current = a.avg() < b.avg()? a: b;
    
    if (min_current.avg() < min_total.avg())
    {
      min_total = min_current;
      end_index = i + 1;
    }
  }
  
  int const begin_index = end_index - min_total.q;
  std::cout << "the average is " << min_total.avg() << '\n'
            << "the contiguous subarray begins at index " << begin_index << '\n'
            << "the contiguous subarray has length " << min_total.q << '\n'
            << "the subarray is { ";
  for (int i = begin_index; i < end_index - 1; ++i)
    std::cout << xs[i] << ", "; 
  std::cout << xs[end_index - 1] << " }\n";
}
Last edited on
The best solution is O(N).

As length 1 isn’t allowed (if it was then this would be best) then you only have to look at subsequences of length 2 or 3.

The average of two subsequences (A, B) can’t be smaller or larger than the smallest or largest averages of those individually. Inductively, then, that takes you down to lengths 2 or 3 (since 1 isn’t allowed).

The solution isn’t unique. (E.g. consider a constant sequence.)
Last edited on
@Mbozzi, the solution is correct and O(n), really great, and thanks.
I only tested it and didn't read the code because I like to firstly figure out the idea well.
So for example the answer to minavg({5, 2, 2, 7, 1, 8}) would be 2 because the minimum average is (2+2) / 2.
Then think about it from a bottom-up perspective, trying to figure out how the base case minavg({5, 2}) can be used to express the slightly larger problem minavg({5, 2, 2}). Essentially, try to notice that minavg({5, 2, 2}) is the minimum of minavg({5, 2}) = (5+2) / 2 = 3.5 and minavg({2, 2}) = 2.
Do you mean that in A{5, 2, 2, 7, 1, 8}, we should first compute the average for the primary two elements (5, 2 = 3.5), and then do the same for the second and third elements (2,2 = 2), then compare the min averages and pick the smaller which is 2, and continue that using the third and forth element and so forth up to the end of the array?
That works for the given array but not different ones, as I had tested this, so I'm sure you meant something else.
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <iostream>
#include <vector>
using namespace std;

int solution( const vector<int>& A )
{
   double minAverage = 0.5 * ( A[0] + A[1] );
   int minPos = 0;
   for ( int k : { 2, 3 } )                                          // O(1)
   {
      for ( int i = 0; i + k <= A.size(); i++ )                      // O(N)
      {
         double sum = A[i];
         for ( int j = i + 1; j < i + k; j++ ) sum += A[j];          // O(1)
         double average = sum / k;
         if ( average < minAverage )
         {
            minAverage = average;
            minPos = i;
         }
      }
   }
   return minPos;
}

int main()
{
   cout << solution( { 5, 2, 2, 7, 1, 8 } ) << '\n';   // Note: returns the POSITION of the START of the min-average sequence
}

Last edited on
I only tested it and didn't read the code because I like to firstly figure out the idea well.

As lastchance pointed out, the "best" (smallest average) range cannot be longer than 3 elements. With the minimum length being 2, you only have to test 2 and 3. It's as simple as that. My concern that you may miss some better results when limiting the maximum length of the range was unfounded ;-)

This makes your "inner" loop O(1). With the "outer" loop still being O(n), you get complexity O(n) overall.
Last edited on
Let's first work this small issue out.
The average of two subsequences (A, B) can’t be smaller or larger than the smallest or largest averages of those individually.
Suppose:
A: {5, 2, 2}
B: {7,1}
The average of A, B is 17/2 = 8.5 (or 17/5 = 3.4)
Average of A is 3
Average of B is 4
So how does the above statement make sense?
You are looking for the minimum average.

Combination of A,B is {5,2,2,7,1} - this has average 17/5=3.4
A is {5,2,2} - this has average 9/3 = 3
B is {7,1} - this has average 8/2 = 4

The minimum-average subsequence (here) is A. That has THREE elements.



The best subsequence overall is {2,2} - this has average 4/2 = 2 and starts at index 1.



For a general proof, if A has nA elements and B has nB elements then the combined sequence AB has average
(nA.<A> + nB.<B>) / (nA+nB)

where <.> denotes an average. Thus the average of the combined sequence is
r<A> + (1-r)<B>

where r is a number between 0 and 1.

Hence, <A,B> lies between <A> and <B>.

Hence, the minimum of <A>, <B> and <A,B> must be one of <A> or <B> - i.e. the smaller parts. Since you aren't allowed a small part of length 1, continuing downward inductively the smallest average (or the largest average, for that matter) must occur for a subsequence of length 2 or 3.


Last edited on
I know these, but just liked to understand your statement. Based on that, 3.4 can't be smaller or larger than the smallest or largest averages of those individually, which are 3 and 4. But 3.4 is larger than 3 and smaller than 4! Something went wrong I think.
But 3.4 is larger than 3 and smaller than 4! Something went wrong I think.


@Frek, you don't understand your own problem. The smallest of 3.4, 3 and 4 is ... 3. This is the average of the length-3 subsequence, not the length-5 one.
{1,1,2}: {1,1} has lowest average
{1,2,2}: {1,2} has lowest average
{1,2,1}: {1,2,1} has lowest average
{2,1,2}: {2,1} (and {1,2}) has lowest average

Only one triplet of those has lower element than their two element subsequences.

How could four elements have lower average? Lets try:
{0,1,1,2}: No, the {0,1} wins.
{1,1,2,1}: No, the {1,1} wins.
{1,2,1,2}: No, the {1,2,1}

(Random tries are no mathematical proof though.)

lastchance did prove that: <A> <= <A,B> <= <B> (when <A> <= <B>)
In this case we cannot report A if it has only one element. That is what makes the (rare) three element lowest average possible here.
Last edited on
When working on a solution, a couple of times I was quite close to the idea that no more than 3 elements in a range can have a lower average than all the smaller subranges, but unfortunately I didn't continue working on that point and thought it'd be better to get the idea from here. My last confusion was about lastchance's statement. Still I can't get that right unless it solely means the idea above.
After getting the idea, I wrote this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int solution(std::vector<int>& A) {

	double min{ 10'000 }, r1 = (A[0] + A[1]) / 2.0;
	size_t index{ 0 };

	for (size_t i = 0; i < A.size() - 2; ++i) {
		double r2 = (A[i + 1] + A[i + 2]) / 2.0;
		double r3 = (A[i] + A[i + 1] + A[i + 2]) / 3.0;
		double temp = std::min(std::min(r1, r2), r3);
		if (temp < min) {
			if (r2 < r1 && r2 < r3) index = i + 1;
			else index = i;
			min = temp;
		}
		r1 = r2;
	}
	return index;
} 
Last edited on
Topic archived. No new replies allowed.