Transform your teaching with our 100% Free AutoLessonPlan!
Try Free
AutoLessonPlan logoAutoLessonPlan
Educational Assessment

Assessment & Evidence

A comprehensive, interactive guide to understanding assessment types, evidence collection, rubrics, psychometrics, and everything in between, along with hands-on simulations.

Assessment is the backbone of effective education. It informs instruction, measures progress, and provides evidence of learning. But not all assessments are created equal. This guide explores 30 essential concepts in assessment and evidence, each with clear explanations, key points, real-world examples, and interactive simulations to deepen your understanding.

Types of Assessment

Assessments serve different purposes at different stages of learning. Understanding these types helps educators choose the right tool for the right moment, from diagnosing readiness before a unit begins, to monitoring progress during instruction, to evaluating achievement at the end.

Simulation: Assessment Timeline

Drag the slider to explore the assessment timeline

Week 5 of 16

DiagnosticFormativeSummative

Formative Phase

Ongoing monitoring and feedback during the learning process.

Exit TicketsThink-Pair-ShareQuizzesObservationsConcept Maps

Formative Assessment

Ongoing, low-stakes checks during learning

Formative assessment is a range of formal and informal diagnostic testing procedures conducted during the learning process to monitor student understanding and inform instructional adjustments. It is "assessment for learning" rather than "assessment of learning." The primary purpose is to provide feedback that improves teaching and learning while it is still happening.

Key Points

Conducted during instruction, not after
Low-stakes: does not heavily impact grades
Provides immediate, actionable feedback
Helps teachers adjust instruction in real-time
Empowers students to self-monitor progress

Example

A teacher uses exit tickets at the end of each lesson to check if students understood the main concept. Based on responses, the next day's lesson is adjusted to reteach misunderstood topics.

Summative Assessment

Evaluative assessments at the end of a learning period

Summative assessment evaluates student learning at the end of an instructional unit by comparing it against a standard or benchmark. These are "assessment of learning", measuring the cumulative result of the educational process. They are typically higher-stakes and contribute to final grades.

Key Points

Occurs at the end of a unit, course, or program
High-stakes: often determines grades or advancement
Measures achievement against defined standards
Provides accountability data for stakeholders
Results are often used for reporting, not instruction

Example

End-of-semester final exams, AP tests, or a culminating project presentation that accounts for 30% of the course grade.

Diagnostic Assessment

Pre-assessment to identify prior knowledge & gaps

Diagnostic assessment is a form of pre-assessment that allows teachers to determine students' individual strengths, weaknesses, knowledge, and skills prior to instruction. It is primarily used to diagnose learning difficulties and to plan appropriate instruction.

Key Points

Administered before instruction begins
Identifies pre-existing knowledge and misconceptions
Helps differentiate instruction from the start
Not typically graded or scored for reporting
Can reveal learning gaps that need targeted support

Example

A math teacher gives a diagnostic test on fractions at the start of the unit. Results show 40% of students already master equivalent fractions, so she provides enrichment for them while reteaching basics to others.

Authentic Assessment

Real-world tasks that demonstrate applied learning

Authentic assessment measures students' ability to apply their knowledge and skills to real-world tasks and challenges. Rather than selecting answers on a multiple-choice test, students demonstrate their understanding through performances, portfolios, and projects that mirror professional or life contexts.

Key Points

Mirrors real-world tasks and challenges
Requires higher-order thinking and application
Often involves collaboration and problem-solving
Values process as much as product
Produces tangible evidence of competence

Example

Instead of a multiple-choice test on government, students simulate a legislative session, drafting bills, debating, and voting — demonstrating civic understanding in an authentic context.

Performance-Based Task

Students demonstrate skills through structured activities

Performance-based tasks require students to perform a task rather than select an answer. These tasks assess the application of knowledge and skills through demonstrations, presentations, experiments, or other active processes. They evaluate both the process and the final product.

Key Points

Requires active demonstration of skills
Assesses both process and product
Often includes a performance component observed by evaluators
Can be individual or group-based
Typically scored with rubrics for consistency

Example

In a science class, students design and conduct an experiment to test water quality in a local stream, then present findings with data analysis and recommendations.

Rubrics & Criteria

Rubrics provide structured frameworks for evaluating student work. They make expectations explicit, ensure consistent scoring, and give students clear targets. The choice between analytic and holistic rubrics depends on the purpose: detailed feedback or efficient overall scoring.

Simulation: Analytic vs Holistic Rubrics

Click any cell to select it — notice how each criterion is scored independently

Criteria1 — Beginning2 — Developing3 — Proficient4 — Exemplary
Content KnowledgeMinimal factsBasic understandingSolid graspExpert depth
Critical ThinkingNo analysisSurface analysisClear reasoningInsightful synthesis
CommunicationUnclearAdequate clarityWell-organizedCompelling & polished
Use of EvidenceNo evidenceWeak supportRelevant examplesRich, persuasive evidence

✦ Analytic advantage: You can see that a student might score 4 on Content Knowledge but 1 on Communication — this specificity enables targeted improvement.

Rubrics (Analytic)

Criteria assessed separately with detailed descriptors

Analytic rubrics break down a task into specific criteria or dimensions, each assessed independently with its own scale and descriptors. This provides detailed, diagnostic feedback about strengths and weaknesses in each area, allowing for targeted improvement.

Key Points

Separates assessment into multiple criteria
Each criterion has its own performance levels and descriptors
Provides detailed, specific feedback per dimension
More time-consuming to create and use
Reveals specific areas needing improvement

Example

A writing rubric with separate criteria: Ideas (1-4), Organization (1-4), Voice (1-4), Word Choice (1-4), Conventions (1-4). A student might score 4 on Ideas but 2 on Conventions.

Rubrics (Holistic)

Single overall score based on overall quality

Holistic rubrics assess the overall quality of a student's work as a single score, rather than evaluating individual criteria separately. The evaluator considers all criteria simultaneously and assigns one composite score based on the overall impression of the work.

Key Points

Assigns one overall score for the entire product
All criteria considered simultaneously, not separately
Faster to apply than analytic rubrics
Less detailed feedback for students
Best for quick assessments or when criteria are interdependent

Example

An essay scored as "4: Exemplary," "3: Proficient," "2: Developing," or "1: Beginning" based on the overall quality, without separate scores for individual writing traits.

Success Criteria

Clear descriptions of what success looks like

Success criteria are clear, specific statements that describe what success looks like for a given learning goal. They help students understand exactly what they need to do to demonstrate learning and allow both teachers and students to evaluate progress objectively.

Key Points

Derived from learning objectives
Written in student-friendly language
Observable and measurable
Shared with students before and during learning
Enable self and peer assessment

Example

For a learning goal "Write a persuasive essay," success criteria might include: "I can state my opinion clearly," "I can provide at least 3 supporting reasons," "I can address a counterargument."

Evidence & Benchmarking

Collecting and interpreting evidence of learning is central to sound assessment practice. How we benchmark performance (against fixed criteria or relative to peers) fundamentally changes what the results mean.

Simulation: Criterion vs Norm-Referenced

Criterion-Referenced

Pass threshold: 70%

Student 1
72%
PASS
Student 2
85%
PASS
Student 3
91%
PASS
Student 4
63%
FAIL
Student 5
78%
PASS
Student 6
45%
FAIL
Student 7
88%
PASS
Student 8
55%
FAIL
Student 9
93%
PASS
Student 10
67%
FAIL

6 of 10 passed, meaning everyone can potentially pass

Norm-Referenced

Ranked against peer group

Score 93
90th %ile
90%
Score 91
80th %ile
80%
Score 88
70th %ile
70%
Score 85
60th %ile
60%
Score 78
50th %ile
50%
Score 72
40th %ile
40%
Score 67
30th %ile
30%
Score 63
20th %ile
20%
Score 55
10th %ile
10%
Score 45
0th %ile
0%

Success is relative: a high score can be "average" if everyone else also scored high

Learning Evidence

Documented proof that learning has occurred

Learning evidence encompasses all the artifacts, observations, and data that demonstrate student learning has occurred. It includes both direct evidence (test scores, projects) and indirect evidence (surveys, reflections). Effective evidence collection is systematic and aligned with learning outcomes.

Key Points

Should be directly aligned with learning outcomes
Includes multiple forms: written, oral, visual, performative
Must be sufficient to make valid inferences about learning
Both quantitative and qualitative evidence matter
Triangulation of evidence sources increases validity

Example

A teacher collects learning evidence through: pre/post tests, student work samples, observation notes, self-reflection journals, and project portfolios to build a comprehensive picture of each student's learning.

Benchmarking

Comparing performance against defined standards

Benchmarking in education involves comparing student, school, or system performance against defined standards, other institutions, or best practices. It provides reference points to evaluate progress and identify areas for improvement.

Key Points

Uses reference points to evaluate performance
Can be internal (within school) or external (across schools)
Identifies gaps between current and desired performance
Drives continuous improvement processes
Benchmarks should be clearly defined and communicated

Example

A school compares its 8th-grade math scores against the district average, state proficiency standards, and similar schools nationally to identify performance gaps and set improvement targets.

Criterion-Referenced

Scores measured against fixed criteria, not other students

Criterion-referenced assessments measure student performance against a fixed set of predetermined criteria or learning standards. Every student can theoretically achieve the highest level — success is not limited by how others perform. The focus is on what students can do relative to defined expectations.

Key Points

Measures against fixed standards, not other students
All students can potentially achieve mastery
Results indicate what a student can or cannot do
Supports mastery learning approaches
Does not produce a bell curve distribution

Example

A driving test is criterion-referenced — you pass if you meet the defined standards for safe driving, regardless of how many others pass or fail.

Norm-Referenced

Scores compared relative to other test-takers

Norm-referenced assessments compare a student's performance against the performance of a representative group of peers (the "norm group"). Results are interpreted relative to how others performed, ranking students from highest to lowest.

Key Points

Compares student performance to a peer group
Produces rank-order results (percentiles, stanines)
Designed to spread students along a continuum
Useful for selection and classification decisions
Cannot determine if students meet absolute standards

Example

The SAT is norm-referenced — a score of 1200 means you scored higher than approximately 75% of test-takers, but it doesn't specify what content you mastered.

Portfolio & Project-Based

Portfolio and project-based assessments value the process of learning over time. They capture growth, encourage reflection, and produce authentic evidence of competence that traditional tests cannot.

Portfolio Assessment

Curated collection of student work over time

Portfolio assessment is a method where students compile a purposeful collection of their work over time to demonstrate growth, achievement, and self-reflection. Portfolios can be process-oriented (showing evolution) or product-oriented (showing best work), and they support authentic assessment.

Key Points

Collects work samples across time to show growth
Includes student self-selection and reflection
Supports both formative and summative evaluation
Encourages student ownership of learning
Can be digital or physical

Example

An art student's portfolio includes early sketches, revised drafts, final pieces, and reflection journals — showing artistic development and self-awareness over the semester.

Capstone Project

Culminating project integrating knowledge & skills

A capstone project is a multifaceted, culminating assignment that serves as a final academic and intellectual experience for students, typically at the end of a program. It requires students to integrate and apply knowledge and skills acquired throughout their coursework to a real-world problem or research question.

Key Points

Culminating experience at the end of a program
Requires integration of knowledge across disciplines
Often involves research, design, and presentation
Demonstrates readiness for career or further study
Typically assessed with comprehensive rubrics

Example

Engineering students design, build, and present a sustainable water filtration system, applying physics, chemistry, project management, and communication skills from their entire program.

Self & Peer Assessment

When students assess themselves and each other, they develop metacognitive skills, critical thinking, and ownership of learning. These approaches work best when students are trained and given clear criteria.

Simulation: Feedback Loops

Assess

Student completes task or test

Self-Assessment

Students evaluate their own learning & progress

Self-assessment involves students reflecting on and evaluating their own learning, work quality, and progress. It develops metacognitive skills and promotes learner autonomy. When done well, it aligns student self-perception with external evaluation and encourages goal-setting.

Key Points

Develops metacognitive awareness
Encourages ownership and responsibility for learning
Should be guided by clear criteria or rubrics
Works best when students are trained in self-evaluation
May initially show bias toward over- or under-estimation

Example

After completing a presentation, a student fills out a self-evaluation form rating their preparation, delivery, content accuracy, and use of visual aids against a provided rubric.

Peer Assessment

Students evaluate each other's work

Peer assessment involves students evaluating the work or performance of their peers against defined criteria. It develops critical evaluation skills, exposes students to different approaches, and can provide more feedback than a single teacher could give.

Key Points

Builds critical evaluation and feedback skills
Exposes students to diverse approaches and solutions
Requires clear criteria and training to be effective
Can supplement but should not replace teacher feedback
Potential for bias — anonymity and guidelines help

Example

In a writing workshop, students exchange essays and use a structured peer review form to provide feedback on thesis clarity, evidence quality, and writing mechanics.

Feedback Loops

Cyclical process of input, response, and adjustment

Feedback loops in assessment describe the cyclical process where information about performance is used to make adjustments that improve future performance. Effective feedback loops are timely, specific, and actionable, creating a continuous cycle of assessment → feedback → adjustment → improvement.

Key Points

Creates a continuous cycle of improvement
Must be timely, specific, and actionable
Involves both teacher-to-student and student-to-teacher feedback
Negative (corrective) and positive (reinforcing) loops both matter
Technology can accelerate and systematize feedback loops

Example

A teacher gives formative quiz results → student identifies weak areas → student revises → teacher adjusts instruction → student retakes assessment → improved performance. This cycle repeats throughout learning.

Mastery & Progress Tracking

Mastery learning and progress-tracking tools ensure no student is left behind. These approaches emphasize that all students can achieve proficiency given sufficient time and support, and they provide multiple pathways to demonstrate understanding.

Simulation: Mastery Learning

Mastery threshold: 80%

Alex
55%
A1
Sam
82%
A1
✓ Mastered
Jordan
68%
A1
Taylor
90%
A1
✓ Mastered
Morgan
74%
A1
Key insight: In mastery learning, time is the variable, not achievement. Every student can reach mastery with sufficient support and attempts.

Simulation: Concept Mapping

Click any node to highlight its relationships

includesincludesprovidesusesproducesrequiresenablesgeneratesdemonstratesAssessmentFormativeSummativeFeedbackRubricsGradesValidityMasteryEvidence

Mastery Learning

Students must demonstrate mastery before progressing

Mastery learning is an instructional strategy where students must demonstrate a predetermined level of mastery (typically 80-90%) on a topic before moving to the next. Students who don't reach mastery receive additional support and opportunities to retake assessments until they succeed.

Key Points

Requires defined mastery thresholds (usually 80-90%)
Students receive additional support until mastery is achieved
All students can reach mastery — time is the variable
Reduces cumulative learning gaps
Requires flexible pacing and multiple assessment opportunities

Example

In a mastery-based math class, a student scoring 65% on a fractions test receives targeted intervention and retakes a parallel assessment. They continue until demonstrating 85% mastery before moving to decimals.

Exit Tickets

Brief end-of-lesson checks for understanding

Exit tickets are brief formative assessments given at the end of a lesson to quickly gauge student understanding. They typically ask 1-3 focused questions and take just a few minutes to complete. Results inform the next day's instruction.

Key Points

Brief — typically 3-5 minutes to complete
Focused on the lesson's key learning objective
Provides immediate data for instructional planning
Low-stakes and non-threatening for students
Can be paper-based or digital

Example

At the end of a lesson on photosynthesis, students write on an index card: "Explain in one sentence how sunlight helps plants make food" and "What is one question you still have?"

Concept Mapping

Visual diagrams showing relationships between concepts

Concept mapping is a visual assessment technique where students create diagrams that show relationships between concepts. Nodes represent concepts, and labeled connecting lines show how they relate. Concept maps reveal students' structural understanding and can identify misconceptions.

Key Points

Visual representation of knowledge structure
Reveals understanding of relationships, not just facts
Can identify misconceptions and knowledge gaps
Useful as both a learning tool and assessment tool
Can be scored for complexity, correctness, and completeness

Example

After a unit on ecosystems, students create a concept map linking terms like "producer," "consumer," "decomposer," "energy flow," and "nutrient cycle" with descriptive connecting phrases.

Observational Checklist

Structured observation records of student behaviors

An observational checklist is a structured tool that teachers use to systematically record observations of student behaviors, skills, or performance during learning activities. It ensures consistent and objective documentation of what students do in naturalistic settings.

Key Points

Systematic and structured observation method
Predefined behaviors or skills to observe
Reduces subjectivity in classroom observations
Can track frequency, duration, or quality of behaviors
Useful for skills not easily captured by written tests

Example

During a group project, a teacher uses a checklist to note which students: ask clarifying questions, build on others' ideas, stay on task, and help resolve disagreements.

Standardized & Norm-Referenced

Standardized and norm-referenced assessments provide comparable data across large populations. Understanding how percentile ranks, grading curves, and ipsative measures work is essential for interpreting these results correctly.

Simulation: Bell Curve / Grading Curve

FDCBAμ=72
F: 7%
D: 24%
C: 38%
B: 24%
A: 7%

Simulation: Percentile Rank Calculator

45
52
58
63
67
70
72
75
78
80
82
85
88
90
93

53th

Percentile Rank

You scored equal to or better than 53% of the class

Note: Percentile rank ≠ percentage correct. A score of 78% correct might be at the 73rd percentile if most students scored lower.

High-Stakes Testing

Assessments with significant consequences attached

High-stakes testing refers to assessments that carry significant consequences for students, teachers, schools, or districts based on the results. These tests are used to make important decisions such as grade promotion, graduation, school funding, or teacher evaluation.

Key Points

Results have significant consequences for stakeholders
Used for high-impact decisions (graduation, funding, placement)
Often mandated by policy at state or national level
Critics argue they narrow curriculum and increase anxiety
Supporters argue they ensure accountability and standards

Example

A state-mandated graduation exam that students must pass to receive a high school diploma, regardless of their course grades.

Ipsative Assessment

Measuring progress against one's own past performance

Ipsative assessment compares a student's current performance against their own previous performance, rather than against external criteria or other students. It measures personal growth and improvement over time, celebrating individual progress.

Key Points

Compares current performance to own past performance
Measures personal growth, not relative standing
Highly motivating — everyone can show improvement
Reduces competitive pressure and anxiety
Useful for tracking individual learning trajectories

Example

A student scored 60% on their first essay and 75% on their second. The ipsative measure shows a 15% personal improvement, regardless of how classmates performed.

Percentile Rank

Percentage of scores that fall below a given score

Percentile rank indicates the percentage of scores in a distribution that a given score is greater than or equal to. It is a norm-referenced statistic that shows a student's relative position within a group. A percentile rank of 75 means the student scored equal to or better than 75% of the reference group.

Key Points

Indicates relative position within a group
Ranges from 1 to 99 (never 0 or 100)
A percentile rank of 50 is the median
Not the same as percentage correct on a test
Useful for norm-referenced interpretation

Example

If a student scores in the 85th percentile on a reading test, they performed as well as or better than 85% of students in the norm group.

Standardized Testing

Uniform administration and scoring across all test-takers

Standardized testing refers to assessments that are administered and scored in a consistent, predetermined manner. All test-takers answer the same questions (or equivalent forms) under the same conditions and time limits, and scores are calculated using uniform procedures.

Key Points

Uniform administration procedures for all test-takers
Standardized scoring and reporting methods
Enables fair comparison across groups and settings
Can be norm-referenced or criterion-referenced
Includes established reliability and validity data

Example

The GRE is standardized — all test-takers see equivalent questions, have the same time limits, and scores are calculated through the same process, enabling comparison across institutions.

Grading Curve

Adjusting grades based on score distribution

Grading on a curve refers to adjusting student scores or assigning grades based on the distribution of scores in a class, often fitting them to a normal (bell) curve. It ensures a predetermined distribution of grades (e.g., 10% A, 20% B, 40% C, 20% D, 10% F).

Key Points

Grades adjusted based on class score distribution
Often assumes a normal (bell curve) distribution
Can artificially limit the number of top grades
May create unhealthy competition among students
Some curves boost all grades; others force a distribution

Example

On a difficult exam where the class average is 60%, a professor curves grades so that the top 10% get A's, next 20% get B's, middle 40% get C's, and so on — regardless of absolute performance.

Quality & Psychometrics

The quality of an assessment depends on its validity and reliability — whether it measures what it claims, and whether it does so consistently. Item analysis and scalability considerations ensure that assessments work well at any scale.

Simulation: Validity & Reliability Matrix

Neither Valid Nor Reliable
V: LowR: Low

The test is inconsistent AND doesn't measure what it claims. Worst case scenario.

Valid But Not Reliable
V: HighR: Low

The test measures the right thing, but inconsistently. Scores fluctuate unpredictably.

Reliable But Not Valid
V: LowR: High

The test is consistent, but measures the wrong thing. Consistently wrong!

Both Valid And Reliable
V: HighR: High

The gold standard: measures the right thing consistently.

Remember: Reliability is necessary but not sufficient for validity. A test can be reliable without being valid, but cannot be valid without being reliable.

Simulation: Item Analysis Dashboard

Analyze each test item: ideal difficulty is 0.3-0.7 and discrimination > 0.3

ItemDifficulty IndexDiscrimination IndexStatusDistractor Analysis
#1
0.92
0.08
Too EasyA:2% B:4% C:2% D:92%
#2
0.55
0.72
GoodA:15% B:55% C:20% D:10%
#3
0.48
0.65
GoodA:22% B:10% C:48% D:20%
#4
0.15
0.12
ProblematicA:15% B:30% C:40% D:15%
#5
0.70
0.55
GoodA:10% B:70% C:12% D:8%
#6
0.35
0.78
Hard but GoodA:35% B:25% C:20% D:20%
#7
0.88
0.22
MarginalA:5% B:88% C:4% D:3%
#8
0.50
-0.05
BrokenA:25% B:25% C:25% D:25%
Good Item

Difficulty 0.3-0.7, Discrimination > 0.3

Needs Review

Too easy/hard or marginal discrimination

Broken Item

Negative discrimination or equal distractor spread

Validity in Testing

Does the test measure what it claims to measure?

Validity refers to the extent to which a test measures what it claims to measure. It is the most important quality of an assessment. There are several types: content validity (covers the domain), criterion validity (correlates with outcomes), construct validity (measures the theoretical construct), and face validity (appears appropriate to test-takers).

Key Points

The most fundamental quality of any assessment
Content validity: covers the full content domain
Criterion validity: predicts relevant outcomes
Construct validity: measures the intended theoretical concept
Validity is not all-or-nothing — it exists on a continuum

Example

A math test claiming to measure algebraic reasoning has low validity if it only tests arithmetic computation — it doesn't measure what it claims to measure.

Reliability in Testing

Consistency and stability of test results

Reliability refers to the consistency of a test's results across different administrations, scorers, or items. A reliable test produces stable and dependable scores. Key types include test-retest reliability (consistency over time), inter-rater reliability (consistency across scorers), and internal consistency (consistency across items).

Key Points

Measures consistency of results, not correctness
Test-retest: consistent results over time
Inter-rater: consistent results across different scorers
Internal consistency: items measure the same construct
Reliability is necessary but not sufficient for validity

Example

If a student takes a personality test on Monday and gets very different results on Friday with no real change, the test has low test-retest reliability.

Item Analysis

Statistical analysis of individual test questions

Item analysis is the process of evaluating individual test items (questions) for quality using statistical measures. Key metrics include difficulty index (proportion answering correctly), discrimination index (how well an item differentiates between high and low performers), and distractor analysis (effectiveness of wrong answer choices).

Key Points

Difficulty index: proportion of students answering correctly
Discrimination index: separates high from low performers
Distractor analysis: evaluates effectiveness of wrong options
Helps identify problematic or biased items
Essential for improving test quality over time

Example

An item with a difficulty of 0.95 is too easy (almost everyone gets it right), while an item with a discrimination index of 0.05 doesn't differentiate between strong and weak students — both should be revised.

Scalability

Ability to extend assessment to larger populations

Scalability in assessment refers to the ability to extend an assessment method to larger populations while maintaining quality, consistency, and cost-effectiveness. It involves considerations of administration logistics, scoring efficiency, technology infrastructure, and equity across contexts.

Key Points

Concerns expanding assessment to more students/contexts
Must maintain quality and consistency at scale
Technology enables digital assessment at scale
Cost-effectiveness becomes critical at large scale
Equity considerations must be addressed when scaling

Example

A teacher-created portfolio assessment works well for 30 students but doesn't scale to 30,000. A standardized digital assessment with automated scoring scales effectively while maintaining consistency.

Quick Reference: Assessment Taxonomy

ConceptPurposeTimingStakes
FormativeMonitor & adjustDuringLow
SummativeEvaluate achievementEndHigh
DiagnosticIdentify gapsBeforeNone
AuthenticReal-world applicationDuring/EndVaries
PerformanceDemonstrate skillsDuring/EndMedium
IpsativeTrack personal growthOngoingLow
Norm-ReferencedRank comparativelyEndHigh
Criterion-ReferencedMeasure vs standardsAnyVaries
Self-AssessmentDevelop metacognitionOngoingLow
Peer AssessmentBuild evaluation skillsDuringLow-Med

Key Takeaways

Assessment should serve learning: formative assessment is more powerful for improvement than summative alone.
Validity is the most important quality of any assessment: an invalid test is useless regardless of reliability.
Rubrics make expectations explicit: analytic rubrics for detailed feedback, holistic for efficiency.
Criterion-referenced assessment supports mastery learning; norm-referenced supports selection and ranking.
Self and peer assessment build metacognition but require training and clear criteria.
Feedback loops are the engine of improvement: timely, specific, actionable feedback is essential.
Item analysis keeps assessments healthy: regularly review difficulty and discrimination indices.
Triangulate evidence: no single assessment gives the full picture of student learning.
Lessons
Slides
Home
Worksheets
Activities