April 2006 Archives

I had a long talk with a student last week. We talked about a number of things including my role in the Auckland MBA programme, and how grading is done. I've done a lot of thinking about grading over the years, and so I thought I might start putting them down in the hope that I will eventually turn them into a paper.

I'll start by saying that in my opinion, grading is not an exact science. But nevertheless, it is a system designed to produce reasonable consistent results. My personal view is that in an overall sense, it is difficult, if not impossible to reliably measure a student's achievement to 3 significant figures. Indeed going beyond two, procedures an illusion of accuracy; and for some, going beyond a single figure is stretching the credibility of the system -- but, as will be seen later, such talk of figures may, in themselves, be nonsensical.

A simple definition

Grading is a process of evaluation whereby a student's work is assessed and given a grade. Student who completes a course typically ends up with a series of grades that are then combined together to produce a result.

There are a number of approaches to grading. The two most prevalent are norm-referenced grading, and criterion-based grading.

Norm-based grading

In norm-based grading, students are ranked relative to the performance of others in the class. It is also known as grading on the curve, and this name arises from the way in which grading is achieved by fitting the ranked work to achieve a normal distribution (the classic bell-curve). The consequences of norm-based grading include:

  • Increasing competition within the class and this can result in higher performances
  • Increased pressure to perform, and this can result in higher levels of plagiarism, cheating, and other undesirable behaviours.
  • Relatively easy to administer and implement
  • The final grades are meaningless; they are a measure of ranking rather than of absolute performance
  • Competitive forces discourage collaboration and peer support, and can even lead to sabotaging of peers' work

Nevertheless, norm-based grading systems are widely used, particular in the United States.
For me, perhaps the most significant problem of grading on the curve is the meaninglessness/arbitrariness of the curve. There are few good explanations as to why any given class should produce a normal distribution. I.e. compare an open entry first-year class with an invitation honours class -- why should the grade distributions be at all similar? Further more, as with other systems, calculating the overall grade, effectively adding curves effects together, doesn't seem to make much sense.

Criterion-based grading

In criterion-base assessment, work is graded against a scale that is determined before the assessment. This is the system that is used throughout much of the University of Auckland (and, dare I say endemic throughout much of New Zealand's education system).
The consequences of this type of system are:

  • It provides little information about the relative performance of the student (it's interesting how often a class asks for the grade distribution so that they can see their relative performance).
  • It reduces competition between students.
  • It allows many students to achieve the same grade, i.e. if 30% of the class meets the standards for an A+, that is what they get.
  • The objects/scales can be mis-targeted and either is set too high/low, or measure the wrong things, or not measure all that is needed.
  • and as said elsewhere, "Because of tendency of learning expectations to be mismatched with real learning outcomes, encourages ad hoc grade adjustments, thus contributing to meaningless grades." (http://depts.washington.edu/grading/plan/procon.htm).
  • and, also from the same source "Unduly constrains curriculum development by discouraging the use of very short assignments and/or by encouraging teacher to force exam or assignment to fit into point system easily calculated into scale."

Overall, both systems have strengths and weaknesses (and it's nice to know what they are). Most of the rest of this entry focuses on Criterion-based grading.

The use of broad criteria

Here, the broad grading list looks something like this:

Grade<Description
A+Rare, outstanding+
AExceptional; beyond what was expected
A-Excellent
B+Polished, very good
BCovers everything expected; comprehensive; demonstrated good understanding
B-Good coverage, minor flaws
C+Demonstrated adequate understanding of the fundamentals but some gaps
CSome understanding, but gaps
C-Just adequate1
D+Inadequate, lack of understanding
DVery inadequate, lack of understanding
D-Very poor

Neither part of the grading list is without some contention. For example, the plus/minus system (e.g. A, A+, A-) is not universally accepted. Until quite recently some well know institutions, such as MIT and Stanford2, only used the letter grades (e.g. A, B, C, etc). There are a number of arguments as to why the plus/minus system should not be used. These range from concerns about the impact on increased competition between students, through to concerns about the reliability of accurately distinguishing between the letter grade itself and the plus/minus3.

The verbal description may also be considered contentious4, and from time to time faculty do discuss5 the exact meaning of these descriptions. Nevertheless, they are what have been accepted by the institution.

What is clear from this list is that grades represent an ordered series of categories. As soon as one accepts this, a number of issues arise.

  1. How does one combine a series of grades to arrive at an overall grade?
  2. How big do we expect the categories to be?
  3. Are the categories a relative measure or are they absolute? I.e. Is "Just adequate" for a first year undergraduate student, the same thing as a for a final year masters student?

These are not trivial matters, and they have major impacts not only on students not only in their results, but also on the amount of effort they put in to their work.

Combining grades

For the moment, let's assume that student's achievements can be reliably assessed and appropriate grades awarded. How does one take a series of equally weighted grades, say A, A-, A, A, and arrive at a grade that truly represents the student's overall achievement? Remember, these are categories - it's like saying we have three apples and a pear, what do you have overall? (Or maybe it might be like saying we have three fruit and one vegetable).

In the previous example, is the student an A student or an A- student overall? Many of the systems that rely on assigning a mark to the grade and then finding the central tendency result in the student having an A- (if averaging is used) or an A if the mode is used. Common-sense seems to call for an A for me. But whilst common sense works here, if there are more grades or a more varied distribution, what then? Well, many people use the mean (average) to calculate the answer, but I would suggest that the mode is much more appropriate. Try it out; make up some patterns of grades and see which method gives you a final grade that seems to be the most sensible.

But, in doing all of this work, we have ignored the question of how we assign grades in order to do these calculations. Should it be A = 3, B = 2, C = 1; or should it be A = 10, B = 5, C = 1. In other words, how much harder is to get an A than a C?

Anyone who flicks through their academic transcript, or who asks, will soon know that here we have the following scale for calculating Grade Point Averages (GPA).

GradePoints
A+9
A8
A-7
B+6
B5
B-4
C+3
C2
C-1
D+0
D0
D-0
Anything else0

Whilst this conversion is used for 'summing' grades between courses, most departments use an entirely different scale6 if they need to do 'grade math'. However, the use of such scales for within course calculations seems to be falling out of favour because they tend to encourage some students to focus on the 'grade math' rather than on the learning. I.e. was my C grade 52 or 54. Of course, this problem also exists within an assignment, where individual components7 are assessed; how should they be aggregated?

As one might notice, we are already a long way away from discussing the actual performance of the student.

Conclusion

At the end of the day, the goal must be to have lecturers that can (in a reliable and consistent manner) say "In my opinion, based on the work that was submitted8, this student is an X", where X is some grade value. No grading system can be perfect, but through the use of good judgement, most lecturers can be (and are) consistent9 in the assessment of students performance (but, of course, some students will always dispute that).

As a final note, a number of schools, particular in the United States, have grading policies that seem to boil down to "Having looked at the assignments, plus anything the lecturer might additionally include (but not have mentioned) the final grade will be given".


h2. Some example policies (TBC)

Here are a few policies gatherd from the 'net.

After the average grade for each student is computed numerically using the weighting listed above, Prof. Farhi will discuss those students who are just a point or two below the grade borderlines with the recitation instructors and tutors. On the basis of this discussion, Prof. Farhi may use his discretion to push a small number of students above the borderline. The most common reason for such a grade increase is the case of a student who has shown very significant improvement during the term. (MIT)



Footnotes

1 Grades below C- are failing grades. I.e. D range grades are restricted to work failing work.

2 Apparently, MIT have moved to using the plus/minus system internally, but students' transcripts only report letter grades without the plus/minus.

3 The argument often goes along the lines of "A lecturer can reliably distinguish between an A student, a B student, a C student, and so on; but moving to plus/minus grades introduces greater unreliability into the system and promotes a false sense of accuracy."

4 For example, what exactly does Just adequate mean?

5 These discussions must be recognised for what they are; not sources of disagreement, but a way to build a shared understanding of what each grade actually means. It produces a tacit, rather than an explicit knowledge.

6 One popular scale, has a A+ as > 90, A > 85, A- > 80, B+ > 75, B > 70, B- > 65, C+ > 60, C > 55, C > 50, D+ > 45, D > 40, D < 40. Notice the non-linearity in the scale

7 Whilst rubrics are often seen as more useful, they also have their own pitfalls. Firstly, they often fall into the "addition of grades problems", and rare (if ever) do they provide an exhaustive list of attributes.

8 Assessment is meant to be based on what is being assessed, and not on the effort that went into it (unless that is explicitly part of the assessment).

9 There have been a number of tests to see if this is true. As with much research the results are mixed, but overall the evidence supports the assertion.

Where customer strategic value will be increased by combining products and/or services from a building supplier, the supplier that organises so as to recognise and adjust to the customer strategic needs by offering an optimal package, will be able to bid a higher price, competitors not combining their offerings in such a way, and yet offering similar strategic value to the customer and record a similar probability of success i.e. the integrated builder will make a higher expected profit.

A firm that is organised around disciplinary lines, will realise less profit from jobs thatn one organised around markets.

i.e. A disciplinary firm will tend towards commodity work, and treat most of its work as such. A market firm will treat its markets as a change to add value and thus can command a higher premium..

I'm working on a few projects at the moment.

Firstly, I'm setting up a couple of Wikis. One I'll use to create a resource for students doing the MBA Research Project. The other will provide an overall guide The Auckland MBA™ programme.

Secondly, I'm playing around with podcasts. I think it might be a useful way to promote the MBA programme to potential students, and keep exisiting students informed of what is happening.

Both the Wiki and the podcasts have fairly steep learning curves. Besides get the necessary software set up (MoinMoin for the wiki, and Audacity for recording and editing the podcasts) the issues of how to structure both are quite interesting. For the Wiki the issue is one of loose-tight structure; afterall the site can be edited and structured by almost anyone. For the podcast, whilst there are some fairly specific "rules" on the format, selecting and creating the content is tricky (given the structure).

 

I had an interesting and thought provoking email from a student regarding the choice of Argentina as the destination for the trip that takes place as part of the International Business (IB) course. He said:

Sorry if I sounded dismissive of your enthusiasm for South America. I am sure that you have a good plan in place. But I do believe strongly That the plan should surely be based on what to do first, rather than where to do it ...

I personally would be very excited to visit South America ... But I would be somewhat uneasy that you perhaps cited some exceptions of trade there proving the rule that NZ's isolation really places its trade base first in the Pacific Rim (which of course includes quite a bit of S America before you say it).

I hope to make links throughout the MBA course within and beyond NZ in the hope of using them into the future for careers, business, etc. I kind of figured that the natural 'address' of such links would be closer to SE Asia, China, India, Malaysia, Indonesia, or the US before Argentina. Perhaps you are performing the function of broadening my horizons.

Not that I am anti your plan - as I say, I am sure there are good things to do and good bases for the trip. ...

The student is asking an excellent question, "What is the purpose behind choosing Argentina?"

I think the "what to do" is well understood; but within that framework there is considerable flexibility as to "where to do it". That doesn't mean that the two parts, are independent of one another; they are also interlinked. As I outline my thinking, I hope that this becomes apparent.

I may be being defensive, but my use of examples in Chile and Argentina were intended show the plausibility and practicality of having a relationship with those countries, whether it be through trade or FDI, etc., rather than to say they were a natural choice. Perhaps, I could have said more that that, and so I welcome the student's email as a chance to expound the choice of Argentina.

Let's begin by looking at the "the natural address" for such a trip. That is to say, on the basis of trade, where are the links between NZ and the rest of world. Grabbing the first set of figures I could find (2004), the pattern looks like this:

Exports: Australia 21%, US 14.4%, Japan 11.3%, China 5.7%, UK 4.7%
Imports: Australia 22.4%, US 11.3%, Japan 11.2%, China 9.7%, Germany 5.2%

On that crude basis it would seem that the natural address would be Australia or maybe the US. But, when you look at individual industries (let alone individual firms) the pattern is quite different. I'm sure if we were to consider each student's firm and their probable strategies, there wouldn't be an overall natural choice of destination. (And so, maybe going for the 'aggregate' natural address isn't a bad idea.)

Given a specific firm's unique situation, to load the dice by saying that there is a 'natural linkage' may lead to problems. Albeit based on anecdotal evidence, I feel that too often New Zealand firms move into Australia because "that's what everyone one else does", rather than having a clear strategic purpose in doing business with that country. CER and other free-trade deals are great if there is a strategic advantage in trying to build a strategy around them1. For example, yesterday, I was talking to a CEO who is in exactly this predicament, and is the process of unwinding a decision his predecessor made to enter Australia. Even though Australia is our largest trading partner (and probably will be for some time) it behoves a strategist to consider other alternatives too2.

But before looking in any more depth at the 'natural addresses' for the IB trip, I wonder:

  • To what extent should the IB trip mirror that pattern of export/import activities?
  • What allowance should be made for the anticipated future trends in trade relationships?
  • Should we help students to consider other "non-traditional" areas where NZ might be able to exploit an advantage?
  • To what extent are the experiences and learning from the IB trip/project transferable to other international contexts?

Taking such questions into consideration seems to expand the option set rather than narrowing it down.

But, the last point is probably the most important. I would hope that the IB course gives the student ways to understand the wider implications of doing business international; rather than a narrower, for example, "How do I trade with China".

Pragmatically, I know that the preference for some of the class is to go to China, others to the US, and so on. I also know that, unsurprisingly, the educational objectives behind our choice of destination, and students (sometimes focusing on the more immediate needs of their firm) result in different "answers".



Footnotes

1 Perhaps a discussion of macro strategy, as pursued at a state to state level, might be useful here.

2 Remember Ken Simmond's comment that 4 out of 5 firms have value destroying strategies.

About this Archive

This page is an archive of entries from April 2006 listed from newest to oldest.

March 2006 is the previous archive.

May 2006 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.3-en