Outcomes Assessment

By admin On Apr 4, 2024

Ovid: Spine

Editors: Bono, Christopher M.; Garfin, Steven R.

Title: Spine, 1st Edition

> Table of Contents > Section IX – General Knowledge > 34 – Outcomes Assessment

Outcomes Assessment

Bernard A. Pfeifer

Scientific method requires
the formulation of a hypothesis, testing a sample to prove or disprove
its validity, and applying the principle tested to the entire
population. Adherence to this process reduces bias and promotes valid
interpretation of data. Testing of the hypothesis requires the
appropriate experimental design and measurement apparatus. Process improvement (quality control)
is a closed-loop system in which a problem is identified, a remedy is
selected, data on the effectiveness of the remedy are collected, and
the remedy is assessed and either modified or continued based on the
data collected.

Both of these techniques can and should be applied to
the clinical practice of medicine. The outcome initiatives were started
in the late 1980s as a response to the rising cost of health care.
Regional variation in rates of surgery, especially for the treatment of
spinal problems, pointed out the need to assess the results of
treatment and formulate a best method for appropriate patient care.
Only with “hard” data can one choose from the multiple options
available to select the best care for a given problem. In addition,
insurers are assessing process quality through the use of health
employer data and information set questionnaires and other outcome
measures. New initiatives are under way to use quality incentives to
stimulate better patient care through financial remuneration based on
outcome measures of quality.

Still, the most appropriate use of patient-based
outcomes assessment tools is simply to find the best treatment for the
patient. Rothman used the SF-36 health status questionnaire to prove
the high level of improvement that came with total hip arthroplasty and
compared these outcomes with surgery for herniated disc, total knee
arthroplasty, and scoliosis. Surgery for disc herniation showed the
second best improvement, with total knee arthroplasty third. Patients
operated on for scoliosis showed deterioration in health status.

Haher used a new Scoliosis Research Society instrument
to show that an anterior approach to curve correction had better
acceptance and patient-perceived result than posterior correction for
equivalent curves; this was despite the “objective” outcome of solid
fusion and percent correction seen by the operating surgeons. An
additional study by D’Andrea compared the radiographic outcome of
scoliosis patients with this patient-based outcomes questionnaire and
showed little correlation between the surgeon’s success criterion and
the patient’s self-perception. These examples show the power of
outcomes research.

Outcomes assessment is a less formal process. This
evaluation uses an outcomes instrument for periodic reassessment of
practice patterns, usually with comparison to a large group of similar
patients. The outcomes assessment movement had a surge in the 1980s and
1990s with varied success.

Cardiac surgeons under the leadership of their academy
have kept outcomes records on their surgical patients with an easily
definable end point (i.e., survival); this has been a reaction to
outside surveillance but has led to quality improvements for cardiac
surgery patients. In spinal surgery, the North American Spine Society
first presented its low back outcomes instrument in 1991. This
instrument was an effort by a committee within the group to propose a
questionnaire that would be accepted generally for use in low back
patient analysis. The instrument was designed to measure pain and
function, employment status, expectation, and success of meeting the
expectation. The instrument development followed the principles
outlined subsequently.

The American Academy of Orthopedic Surgeons (AAOS)
incorporated this instrument into its MODEMS database, while at the
same time a private group, the National Spinal Network, formulated its
own instrument set. In the latter part of the 1990s, both groups came
together in accepting the MODEMS data set as a standard. Although the
National Spinal Network still is functioning, the AAOS MODEMS project
has ceased to exist as a data-gathering and comparative group.

Several device manufacturers have their own study groups
that perform outcomes assessment. The inherent bias in this approach
may be overcome through the use of accepted instruments and independent
analysis of the data.

Outcomes assessment can be quite arduous. The quality of
the data set affects the accuracy of the conclusions drawn from it.
Pitfalls include the office mechanics of data collection, expense of
data entry, privacy, and concern regarding the end use of the data
collected.

Most offices use a paper form that the patient
completes, with subsequent key data entry into the computer. The
general patient population must accept the instrument selected.

P.326

Kiosks
and computer data tablets have been tried but rely on the
sophistication of the end user (patient) and have no way of
verification of accuracy. Native language of the patient must be
considered so as not to introduce a translation error.

One also has to consider patient flow. In an office
where most of the patients are seen for consultation and only a small
portion go on to surgery, a decision must be made to distribute the
forms economically. If all patients receive and complete one, many will
be discarded if the patients are seen for one consultation only. If
forms are given to patients selected for a procedure, it may be late in
the process (i.e., the patient may be tired), and the completion rate
and accuracy may suffer. Optimal use would be incorporation of a
smaller form within a general patient intake form with a longer one for
patients selected for a procedure of interest. Having a single staff
member responsible for the tracking is ideal; if the surgeon reviews
the form at some point during the interview, this adds to the success
of the project.

Pooling of data provides a benchmark against which to
assess the current patient or group of patients. Consideration of the
patient’s privacy prevents dissemination of identifying data; yet if no
identifying data are sent to the data repository, the same patient may
be duplicated. This bias can occur when the same patient completes
another data set in a second participating physician’s office.
Solutions to this problem include unique identifiers from combinations
of patient-identifying data (e.g., part of birth date plus part of
social security number) or by comparing the patient with a set of
normative data at any given point in time.

One successful venture was the Maine Medical Assessment
Foundation established by the Maine Medical Association. This group was
federally funded and would send nurses to participant offices to
contact patients directly and have them complete questionnaires and
analyze the data obtained. The group showed the usefulness of surgery
for herniated disc and spinal stenosis. Funding was insufficient to
allow further study of spinal fusion, however, and the project has
since disbanded.

Finally, physicians’ concerns that the data will be used
to their detriment has been a factor in the limited success of these
projects thus far. Although studies have shown that the anonymous
presentation of data to physicians has led to beneficial change in
rates of procedures, there still seems to be a reticence to participate
in pooled outcomes data unless it has been mandated.

INSTRUMENT SELECTION: PATIENT-BASED OUTCOMES INSTRUMENTS

Patient-based outcomes instruments fall into two broad categories:

General health status
Region (or disease) specific

Examples of the former include the SF-36, designed by
Ware, and the Musculoskeletal Functional Assessment, championed by
Swiontkowski. Examples of the latter are the Oswestry Disability Index,
the North American Spine Society (NASS) Instrument, the MODEMS
instrument from the AAOS, and the Scoliosis Research Society instrument.

In taking any measurement, one has to use an
appropriate, accurate, sensitive, and accepted device to assess the
situation. For patient-based outcomes instruments, this process has
been recognized as follows:

An expert panel is assembled to select appropriate items.
A pilot test is run, and the results are assessed to reduce the number of items.
The test is given to groups of patients,
and a retest is given a short time later to ensure reliability; Pearson
coefficients >0.05 are preferred.
The range of responses is analyzed to
avoid clustering, and analyzing edge effects tests validity. Cronbach’s
a coefficient, a measure of item applicability in a scale, >0.86 is
preferred.
Sensitivity to change is assessed by analyzing pretreatment and posttreatment groups.

This type of testing is costly and arduous but required
if one is to apply the instrument across a population of patients. In
addition, cultural issues have to be assessed for applicability of the
instrument to a diverse patient population or for international
application. Finally, an assessment of normative results (i.e., results
obtained from a group of “normal” individuals) may strengthen the
usefulness of the instrument. A comprehensive review of these
instruments was undertaken by NASS. Their Compendium of Outcomes
Instruments is recommended for review before making a selection.

SPECIFIC OUTCOMES INSTRUMENTS AND APPLICATIONS

Pain Diagrams

Pain diagrams, as Palmer first reported in 1949, are one
method of assessing patient condition. Since that time, numerous
articles have appeared assessing the strengths and weaknesses of pain
diagrams. The best more recent summary is by Ohnmiss. The diagram
consists of anterior and posterior outlines of the body with patients
asked to draw symbols indicating the location and the nature of their
pain. Symbols usually are given for ache, numbness, pins and needles,
stabbing, and burning. When Palmer first reported the diagram, a simple
observational characterization of the response in organic and
nonorganic terms was proposed. Uden expanded this classification into
subgroups: organic, possibly organic, possibly inorganic, and
inorganic. Intraobserver reliability was 85% for senior observers and
77% for junior observers. Ransford published a penalty point scoring
system in which points were assigned for pain not in usual locations
seen for root compression and other “commonly encountered causes of
back and leg pain.” High reliability has been reported with the use of
this method. Margolis used a scoring method based on the method used in
burn centers to evaluate extent of injury; a grid was placed over the
drawing, and if a mark was placed in that region, a point was scored.

P.327

Ransford showed that the pain diagram agreement with the
Minnesota Multi-Phasic Personality Inventory (MMPI) was 89% overall
with a sensitivity of 93% and a specificity of 79%. Although Uden
showed the pain diagram correctly predicted herniated disc on
myelography, Rankine showed poor prediction of root compression seen on
magnetic resonance imaging. Mckoy found a significant negative
correlation between pain drawing scores and outcomes 6 months after
surgery. The pain diagram has been found to have a significant
correlation with MMPI hysteria scale and presence of Waddell criterion,
both of which have a negative correlation with surgical success, so
this result in not unexpected. In her summary publication correlation
with discogram findings, Ohnmiss showed that the location of pain
indicated on the diagram was related to the symptomatic level of the
disc. In L3-4 disruption, pain was distributed in the anterior portion;
L4-5 pain was distributed in anterior and posterior portions; and L5-S1
pain was distributed in the posterior portion. It seems that the best
use of the pain diagram is in identifying patients with inorganic pain
and patients with radiculopathy. There may be a correlation with the
level of disc disruption.

Visual Analogue Scale

The visual analogue scale is a straight line drawing
usually labeled from 0 to 10 on which the patient is asked to rate his
or her pain. Adding gradations to create a ruler has expanded this
system, and images have been added to facilitate patient selection. It
is a simple and effective method to allow the patient to rate the
amount of pain and compare pretreatment and posttreatment levels.
Million et al reported the test-retest reliability of this specific
version of the visual analogue scale. They showed r = 0.88 for each of the individual scales and r
= 0.97 for the total in a test administered by an examiner. The test
has been correlated with isokinetic and nondynamometric tests.

SF-36

The SF-36 is the result of 2 decades of evolution of an
instrument designed to assess a patient’s well-being. It was developed
by Ware et al in 1984 for a Harris survey and later modified for a
medical outcomes study. The SF stands for Short Form
because participants did not complete earlier questionnaires that were
deemed too long. The 36 questions are divided into eight domains or
scales (Table 34-1).

TABLE 34-1 SHORT-FORM 36

Scale	Label	Phenomena Captured
Physical functioning	PF	Physical function
Role-physical	RP	Physical disability
Bodily pain	BP	Physical disability and well-being
General health	GH	Physical and mental personal evaluation
Vitality	VT	Physical and mental well-being
Social functioning	SF	Physical and mental disability
Role-emotional	RE	Mental disability
Mental health	MH	Mental function and well-being

Minnesota Multi-Phasic Personality Inventory

The MMPI is a psychiatric test that should be
administered by a qualified professional. As such, it does not perform
as an outcomes instrument but as a test that has been studied and
associated with prediction of surgical outcomes. It originally was
described in 1942 by Starke and has been modified by Butcher. There are
10 scales generated from a questionnaire of 360 to 570 questions. Most
important for the spine patient are the hysteria, hypochondriasis, and
depression scales. Normative and other testing has been carried out for
this instrument.

Oswestry Low Back Pain Disability Questionnaire

The Oswestry Low Back Pain Disability Questionnaire is an 11-question instrument first published by Fairbank in 1980.
It has been modified to a version 2.0, in which the question regarding
pain intensity has been changed from that of medication usage to
intensity. The score is a disability score, and a higher score implies
more disability, which sounds counterintuitive. The instrument has been
shown to have high test-retest reliability and has been validated
against the Roland-Morris Questionnaire. The scoring is a simple
addition of the responses, then doubling. Change greater than 15% is
considered significant. The instrument’s primary author has published a
review of its use and functionality. The test has been criticized in
that the result is counterintuitive; a higher numeric score indicates
more disability and a patient that is faring poorly. Nonetheless, its
simplicity, length, and acceptance have made it a valuable test.

Roland Morris Disability Questionnaire

The Roland Morris Disability Questionnaire instrument is
an outgrowth of the Sickness Impact Profile and first was reported in
1983. It is a 24-question yes/no format describing back pain; brevity
and ease of understanding by the recipient are its hallmark. The
Sickness Impact Profile on which it is based has been studied
extensively; this instrument still is relatively untested.

P.328

Figure 34-1 Part 1.

P.329

Figure 34-1 Part 2.

P.330

Figure 34-1
Part 3. Example of an office-based low back outcomes instrument. The
NASS pain disability score is calculated by averaging items 38, 42, and
46 through 54. The neurologic impairment score is calculated by
averaging items 39 through 45. The expectation score is calculated by
taking the average of the responses in that area (items 36 to 41). The
pain diagram and visual analogue scale stand alone.

P.331

North American Spine Society Low Back Pain Outcomes Instrument

The NASS Low Back Pain Outcomes Instrument was developed
by a committee of the NASS and first formally reported in 1996. The
committee added to the Oswestry base questions concerning amount of
back pain to form a pain score. Six additional questions concerning
amount of and bothersomeness of leg symptoms created a neurologic
score. Both scores were an average of the responses to the individual
questions that made up the scale (range 0 to 5). A lower score
indicates a better functioning patient. Finally, there were questions
regarding expectations for outcomes (for the intake form) and
satisfaction with treatment (for the follow-up form). The instrument
was psychometrically studied with good validity and reliability
characteristics.

TABLE 34-2 NURICK CLASS

Grade 0	Root signs, no cord signs or symptoms
Grade 1	Cord signs, no difficulty walking
Grade 2	Slight difficulty walking, works full-time
Grade 3	Gait difficulty does not allow work, can walk unassisted
Grade 4	Able to walk only with assistance
Grade 5	Chair bound or bedridden

TABLE 34-3 JAPANESE ORTHOPEDIC ASSOCIATION MYELOPATHY SCALE (MAXIMUM SCORE 5 17)

1.	Upper extremity motor
		Unable to feed self	0
		Unable to handle chopsticks, uses spoon	1
		Handle chopsticks with much difficulty	2
		Handle chopsticks with slight difficulty	3
		No deficit	4	Max = 4
2.	Lower extremity motor function
		Unable to walk	0
		Need walking aid on flat floor	1
		Needs hand rail on stairs	2
		Lack of stability and smooth reciprocation	3
		No impairment	4	Max = 4
3.	Sensory deficit—upper extremity
		Severe sensory loss or pain	0
		Mild sensory loss	1
		Normal	2	Max = 2
4.	Sensory deficit—lower extremity
		Same grading as per upper extremity		Max = 2
5.	Sensory deficit—trunk
		Same grading as per upper extremity		Max = 2
6.	Bladder function
		Urinary retention	0
		Severe difficulty	1
		Mild dysfunction	2
		Normal	3	Max = 3

Author’s Instrument

I use an office-based instrument for the lumbar spine that incorporates some of the aforementioned instruments. Figure 34-1 shows this composite instrument.

Other Instruments

MODEMS Questionnaires

The AAOS began an ambitious project in 1992 to design a
complete set of musculoskeletal outcomes instruments that could be used
for any musculoskeletal outcomes research or assessment. For the spine,
the instrument design phase of the project was coordinated with various
subspecialty societies under the auspices of the Council of Spine
Societies. The NASS low back pain outcome instrument was the basis for
the MODEMS spine instrument. The low back module was kept intact with
minor modifications. For the cervical spine, the instrument was
reworded, changing “back” to “neck” and “leg” to “arm.” Parallel scores
(i.e., pain disability and neurologic impairment) were established. An
attempt was made to add scales to measure myelopathy by questioning
stiff and shaky leg symptoms. In addition, questions to assess
comorbidity and an associated scale were constructed. Finally, the
SF-36 was added, and a pain diagram substitution was created. The
expectation scale and the expectation-met scale were retained from the
NASS instrument. All scores were “normalized” to a 0-to-100 scale with
higher numbers indicating better function or a patient in better
condition. The questionnaires have undergone validity

P.332

and
reliability testing, and normative data have been obtained. In
addition, a scoliosis questionnaire was devised that measured the
patient’s view of their attractiveness (cosmesis); this was added to
the low back questionnaire.

TABLE 34-4 FRANKEL SCALE

Frankel A	Sensory complete
Frankel B	Sensory incomplete
Frankel C	Motor useless
Frankel D	Motor useful
Frankel E	Normal

The MODEMS instruments were tested as part of the
overall MODEMS data-gathering project. The above-specified additions
resulted in a lengthy questionnaire. The MODEMS questionnaires have not
been widely accepted and have been criticized on the basis of length of
instrument and arduous nature of data entry. In addition, they were
part of a project to pool data in a central repository that suffered
from poor participation. Nonetheless, it is a worthy effort, and one
can use selected parts to create valuable scales that can provide
helpful data. The normative data were obtained by administering the
questionnaire to people selected by a research firm to represent a
normal cross section of the population. This is the first instrument to
have this feature and provides a benchmark against which the patient
can be measured.

Cervical Spine Research Society Questionnaire

BenDebba developed a questionnaire at Johns Hopkins that
measures cervical outcomes. Good validity and reliability data were
reported. The questionnaire is about as long as the MODEMS instrument
without the SF-36 and the comorbidity questions. Excellent patient
acceptance was reported. No scale has been created with the measure,
but the questions stand independent of each other.

TABLE 34-5 AMERICAN SPINAL INJURY ASSOCIATION MOTOR SCORE

Level	Muscle Group	Right	Left
C5	Elbow flexors	________	________
C6	Wrist extensors	________	________
C7	Elbow extensors	________	________
C8	Finger flexors	________	________
T1	Finger abductors	________	________
L2	Hip flexor		________	________
L3	Knee extensors	________	________
L4	Ankle dorsiflexor	________	________
L5	Toe extensor		________	________
S1	Plantar flexor		________	________
	Max	50	50	Total 100
Grades:	0—Absent
	1—trace, flicker
	2—poor, moves gravity eliminated
	3—fair, moves against gravity
	4—good, moves against resistance
	5—normal

Scoliosis Research Society Questionnaire

Haher devised a shorter questionnaire than the MODEMS to
measure impact of deformity on patients with scoliosis. The validity
and reliability testing of this instrument was coupled nicely with a
study of anterior versus posterior approach to the correction of curve
and indicated a preference for the former approach. The questionnaire
has been modified by Asher but with minimal change. It seems to be most
valuable in adolescent patients, but more recent reports showed its
response to patient condition and that its ceiling and floor effects,
internal consistency, reproducibility, and validity were equal to or
better than the SF-36 domains with which it was compared.

MEASURES OF PATIENT STATUS: NOT SELF-REPORTED

Several valuable indices incorporate patient report with
physical examination parameters to assess the status of the patient.
The Japanese Orthopedic Association myelopathy scale and Nurick
classification are valuable for degenerative conditions. The Glasgow
Coma Scale, the Frankel classification of quadriplegia, and the
American Spinal Injury Association classification are examples of this
type of assessment item for a patient after trauma to the spinal cord.

The Nurick scale attempts to quantify patient impairment from myelopathy based on ability to ambulate (Table 34-2). The Japanese Orthopedic Association first presented its myelopathy impairment scale in 1990 (Table 34-3). This scale takes into account and scores in graded fashion the patient’s ability to walk, do fine motor tasks with the upper

P.333

extremities, feel sensation in trunk and extremities, and control
bladder function. It has been modified for use in Western countries by
substituting use of knife and fork for chopstick use. Numerous
publications have cited the scale, but there has never been a
scientific test of its sensitivity and specificity.

TABLE 34-6 GLASGOW COMA SCALE

1.	Eye opening:			Max = 4
		Spontaneous	4
		Responds to sound	3
		Responds to Pain	2
		Never	1
2.	Verbal response:			Max = 5
		Oriented	5
		Confused conversation	4
		Inappropriate words	3
		Incomprehensible words	2
		None	1
3.	Motor response			Max = 6
		Obeys commands	6
		Localizes pain	5
		Flexion withdrawal	4
		Abnormal	3
		Extension	2
		None	1
				Max = 15

Frankel described a gradation of the patient’s motor ability for use with spinal injury patients (Table 34-4).
He divided this gradation into motor useful and motor useless. This
scale is useful in tracking a patient’s progress or regression during
the course of therapy. The American Spinal Injury Association expanded
on the Frankel classification. The same level of function for each
individual motor group is retained, but key muscle groups are
identified, and a point score is given for each side (Table 34-5).
A separate notation of bladder and bowel function and the lowest
sensory level present are noted. The classification provides a valuable
way of tracking a patient’s progress. Finally, in tracking the trauma
patient, the Glasgow Coma Scale is a valuable adjunct because a patient
with spinal injury often has a concomitant head injury that requires
assessment (Table 34-6).