The purpose of this assignment is to get you to create some very simple threaded programs.
If you write these programs in C you will be able to use the kind of interface described in lecture 3, and it will be less effort than using most other languages. However, you may use Java.
static signed long long int
(that's for C,
for Java use static long
) and print the answer.
The correct answer is 5000005000000000.
Measure how long this takes (both CPU time and real time).
- Q. Sorry if this sounds silly, but I suppose we should turn off compiler optimisations for the assignments?
- A. This is an excellent question. The answer is no. Remember what I said in the lectures? The compiler is allowed to transform your program in seriously strange ways as long as the observable behaviour is unchanged. Several modern compilers analyse linear progressions quite cleverly in order to detect
- might we need a bounds check here?
- are these assignments to array elements independent?
With optimisation, I actually get
0.000 user + 0.000 system = 0.000 total in 0.005 real seconds.Again, remember what I said in the lectures? You can disable optimisation for a specific variable using ‘
volatile
’. Adding that to the declaration of the total produces this time, still with -O3 optimisation:
22.220 user + 0.020 system = 22.240 total in 22.308 real seconds.Here's another approach, which lets the compiler optimise somewhat, but hides what's going on.
- Declare an array of 1,000,000 integers.
- Fill it with the numbers 1 to 1,000,000 in any order.
- Shuffle it at random. See the Fisher-Yates shuffle at Wikipedia for a good algorithm. I use that a lot.
- Initialise the global variable.
- Do 1,000 times:
- Add each element of the array to the global variable.
Doing it that way, I get these times:
2.600 user + 0.030 system = 2.630 total in 2.757 real seconds.That tells us that we have successfully hidden the linear progression from the compiler, so it's not eliminating most of the program, but also that it is working with a local copy of the global total, not going to memory every time. (And the difference between 2.6 seconds and 22.2 seconds tells you a lot about the cost of memory.)
- Q. Also, is it OK to use "time" for cpu/real timings?
- By doing
#include <time.h> clock_t t0, t1; set up the array; t0 = clock(); compute the total; t1 = clock(); print total and (t1-t0)/CLOCKS_PER_SEC;you can measure the part of the program you care about and skip the rest; using the time(1) command will measure everything, including starting the program, dynamic linking, and printing. On the other hand, time(1) gives you real time and clock(3C) does not.
static volatile signed long long int
but
adding should still be done by total += x
.
Measure how long this takes (both CPU time and real time) for N = 1 to 8. Do you still get the correct answer?
Measure how long this takes (both CPU time and real time). Do you get the correct answer now?
atomic_fetch_add
(if available) to update an
atomic counter, or on a Macintosh or other BSD system uses
OSAtomicAdd64Barrier
(see 'man atomic') to do atomic
adds.
If you use Java, you'll want
java.util.concurrent.atomic.AtomicLong
Measure how long this takes (both CPU time and real time) for N = 1 to 8. Do you still get the correct answer?
Submit your assignment within two weeks by sending an
e-mail message to ok@cs.otago.ac.nz
with the
subject "COSC441 Assignment 1" and a .tgz file containing
exactly five files:
The report must contain your name (not login account) and student number.
It should contain your measured times in a table
version | N | CPU time | real time |
---|---|---|---|
single | N/A | 26.4 | 26.9 |
⋮ | ⋮ | ⋮ | ⋮ |
atomic | 8 | ?? | ?? |
It should contain your commentary on what you found.