The Oxford Research Encyclopedia of Psychology will be available via subscription on September 26, 2018. Visit About to learn more, meet the editorial board, or learn about librarian resources.

Show Summary Details

Page of

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, PSYCHOLOGY ( (c) Oxford University Press USA, 2016. All Rights Reserved. Personal use only; commercial use is strictly prohibited. Please see applicable Privacy Policy and Legal Notice (for details see Privacy Policy and Legal Notice).

date: 21 September 2018

Personality Assessment in Clinical Psychology

Summary and Keywords

Attempts at informal personality assessment can be traced back to our distant ancestors. As the field of Clinical Psychology emerged and developed over time, efforts were made to create reliable and valid measures of personality and psychopathology that could be used in a variety of contexts. There are many assessment instruments available for clinicians to use, with most utilizing either a projective or self-report format. Individual assessment instruments have specific administration, scoring, and interpretive guidelines to aid clinicians in making accurate decisions based on a test taker’s answers. These measures are continuously adapted to reflect the current conceptualization of personality and psychopathology and the latest technology. Additionally, measures are adapted and validated to be used in a variety of settings, with a variety of populations. Personality assessment continues to be a dynamic process that can be utilized to accurately and informatively represent the test taker and aid in clinical decision making and planning.

Keywords: personality assessment, personality, history, instruments, procedures, psychometrics, projective, self-report, future directions


Personality assessment dates back to informal methods employed by our distant ancestors. Current personality assessment techniques are the product of a modern history of developing, testing, and revising measures to reflect contemporaneous conceptualization of personality and psychopathology. These measures have specific administration and interpretation guidelines to standardize personality assessment practice. As the field of personality assessment has and will continue to evolve, measures and procedures will likely continue to be adapted to be as clinically useful as possible.


Informal personality assessment has been practiced since before written records. To judge, and place value on others’ observed personality characteristics allowed our distant ancestors to understand and predict their friends’ and foes’ behavior. This would have served as an advantage and useful survival tactic (Miller, 2007). Over the course of time, this process has developed into more sophisticated and, eventually, formal methods of understanding and predicting human behavior (Ben-Porath & Butcher, 1991).

Attempts to formalize personality assessment can be traced back to biblical times, often during times of war. Gideon is described in the Old Testament as selecting soldiers to take to war. One of his selection criteria was based on a direct observation of the trait of fear. He allowed soldiers who were afraid to return home. Gideon’s second criterion compared those who drank from a pond with their cupped hands instead of lying down. This criterion assumed that the individuals who drank lying down were less vigilant and cautious than those who drank with their hands (Hathaway, 1965). In China, as far back as 1115 bc, candidates for government positions were examined for their proficiency in six arts to test their character (Ben-Porath & Butcher, 1991; Dubois, 1970). Personality assessment methods have continued to develop over time, with assessment methods reflecting current understanding of human nature.

In the 19th and early 20th centuries, as the field of psychology was starting to emerge as a science, methods of personality assessment began to shift to empirical observations, with some methods proving, eventually, more effective than others. Phrenology rose in popularity during this time, and was the first instance of a commercialized and marketed product of psychology (Ben-Porath & Butcher, 1991). Phrenology originated from observations and speculations about personality made by the German physician Franz Joseph Gall in the late 18th and early 19th centuries. Gall made his first empirical observations in a prison, where he noticed that pickpockets had a recurring protrusion at a certain location on their head. He deduced that this location was above the area of the brain responsible for the personality characteristic of acquisitiveness. Through continuous observations of head shapes and their associated personality characteristics, Gall posited six assumptions for his method of personality assessment. These assumptions were that people have innate mental faculties, the brain is the mind’s organ, the shape and size of the brain are distinguishable by the shape and size of the skull, the mind has distinct faculties and the brain has distinct organs, the size of each organ can be estimated and is a measure of power, and when each organ is active it can influence the body with certain attitudes and movements. During an era when there was a focus on the general laws of the mind, Gall’s theory brought into focus the idea of individual psychological differences (Ben-Porath & Butcher, 1991).

The first structured rating scale was published by Heymans and Wiersma in 1906 (Heysman & Wiersma, 1906). The measure consisted of 90 items and was rated by 3,000 physicians based on patients with whom they were familiar. The physicians rated their patients by underlining any traits relevant to that person. The authors considered and organized the physicians’ ratings in terms of three domains: emotionality, activity, and secondary function (related to what is now understood as extraversion and introversion) (Eysenck, Hendrickson, & Eysenck, 2013). The first self-report personality inventory was published in 1919 by Robert Woodworth to detect psychiatric problems and maladjustment in soldiers recruited for the United States Army during World War 1. Woodworth collected data using soldiers with “shell shock” and college students (Butcher, 2009; Woodworth, 1919). This marked a significant shift toward modern practices.

Many of the current, most widely used personality measures were created in the early to mid-1900s. Projective techniques rose in popularity at this time and were based on the assumption that aspects of their personality can be assessed by having individuals provide a judgment or structure to ambiguous or unstructured stimuli (Cohen, Swerdlik, & Sturman, 2013). Two of the most commonly used projective measures, the Rorschach Inkblot Test (Rorschach, 1942) and Thematic Apperception Test (TAT; Morgan & Murray, 1935), were created in the 1920s and 1930s. The Rorschach involves the interpretation of a person’s responses to inkblots. The TAT involves making inferences about personality through the based on stories a test taker creates in response to a series of pictures.

The first comprehensive self-report measure of personality, the Minnesota Multiphasic Personality Inventory (MMPI) was published in 1943 by Starke Hathaway and J. Charnley McKinley. The MMPI was developed to be an empirically based, differential diagnostic tool. As detailed by Ben-Porath (2012), Hathaway and McKinley were influenced by the model of psychopathology in use at that time, behavioral and psychodynamic thinking, and their psychometric knowledge and experience. Items were chosen for the scales by contrasting responses from samples of differentially diagnosed patients with those of a nonclinical sample. An examinee’s raw scale scores were calculated by counting the items they answered in the direction more often answered by the specific scale’s targeted disorder group than by the nonclinical sample (Ben-Porath, 2012).

Development of some of the most widely used projective and self-report techniques coincided with World War II in the context of a pressing need for psychological services in two domains: clinical services and personnel selection (Butcher, 2009). Since then, researchers and clinicians have worked to expand the role of personality assessment into a variety of settings. They have continued to develop, adapt, and test various measures of personality to validate their appropriateness for use in diverse contexts. This includes examining the utility of personality assessment with populations of various ages, ethnicities, races, sexes, cognitive abilities, and overall levels of functioning. Currently, personality assessors may choose from an array of instruments and methods of administration, depending upon the specific needs of their clients and referral sources. When selecting which measures to use, clinicians are guided to consider several features of formal assessment methods (AERA, 2014) discussed next.

Reliability, Validity, and Standardization


Reliability reflects the accuracy of test scores. In classical test theory, observed test scores include a true score element, reflecting the individual’s actual standing on the targeted construct and an element of random measurement error. The extent of measurement error reflects the unreliability or inaccuracy of the observed score. Knowledge of a test score’s reliability informs the user of the accuracy of information provided by the test. Reliability can be estimated in a number of ways. The most informative approach to estimating reliability depends upon the nature of the targeted construct and the method for administering a test.

Test-Retest Reliability is estimated by correlating scores obtained by a sample of test takers across two different administrations separated by a relatively short time interval. This is a useful estimate of reliability when the trait being measured is relatively stable and not expected to change between the designated time points. Parallel and Alternate Forms Reliability are estimated by correlating scores obtained by a representative sample on purportedly parallel or alternate versions of a test administered at the same time. This approach is only useful when parallel forms of a measure are available. Split-half Reliability is estimated by correlating scores obtained on items from equivalent halves of a test administered at the same time. Internal Consistency Reliability is estimated by calculating the average correlation between all possible pairs of items on a scale and correcting this average to reflect the actual number of items on that measure. Inter-Rater Reliability estimates reliability by examining the consistency or agreement among scores generated by different raters or scorers. For example, it is the extent to which two raters derive similar diagnoses from responses to items on a structured diagnostic interview.


The utility of a personality measure is often appraised based on information concerning the reliability and validity of the test scores it produces. Information concerning the validity of test scores pertains to whether, and to what extent, they reflect the test taker’s standing on the targeted variables (Groth-Marnat & Wright, 2016). Because these variables are unobservable constructs, evidence of test score validity often takes the shape of correlations between test scores and designated criteria. Validation research is an ongoing process that yields growing evidence of the validity of measures and underlying personality theory (Butcher, 2009; Messick, 1995).

There are multiple indicators of test score validity. Content Validity pertains to the extent to which test items adequately canvas the relevant content domain for the targeted construct. For a diagnostic interview, for example, to what extent do the items allow the assessor to consider the formal criteria for diagnosing a given disorder. Criterion Validity can be examined based on correlations between test scores and relevant extra-test criteria. The strength of associations between test scores and criteria expected to be associated with the test reflect Convergent Validity. The absence of associations with criteria that would not be expected to correlate with a scale reflects their Discriminant Validity. Construct Validity pertains to the extent to which test scores relate to a range of criteria as would be expected based upon about the constructs assessed. As detailed by Cronbach and Meehl (1955), establishing construct validity is a process that may alter our understanding of the construct of interest. Evidence of construct validity is accumulated through hypotheses testing.

Scores on a psychometrically adequate measure should be both reliable and valid. Indeed, the reliability of test scores sets an upper limit on their validity. Recall that the unreliable component of observed test scores consists of random measurement error. By definition, this component cannot correlate with anything, and therefore, does not provide valid information. Establishing the reliability and validity of test scores is a continuous process, which is informed by and updated through research conducted in varied contexts. Additional resources are available for more information regarding general psychometrics and reliability (Allen & Yen, 2002; Nunnally & Bernstein, 1994; Thompson, 2003).


Many assessment instruments rely on norms that represent the distribution of test scores from a standardization sample. Comparisons can be made, and meaningful information can be derived from the relationship and similarities between an individual’s test scores and the standardization sample. Some things to consider when evaluating the adequacy of norms are whether the standardization group includes representation from the population of interest, whether the standardization group is large enough, and knowledge of any specialized subgroups of norms as well as the broad norms. Information about norms can be found in test manuals to help the examiner evaluate the representativeness and appropriateness of the norms for the individual they are assessing (Groth-Marnat & Wright, 2016).

In addition to the psychometric properties of test scores just discussed, erroneous test results may result from other sources. Adherence to standard test administration procedures is crucial to reducing error. This entails following the standard administration and scoring procedures. Failing to do so renders reliability and validity estimates, presumably obtained following standard procedures, inapplicable. Of course, adhering to standard administration procedures requires that they be spelled out with sufficient detail and clarity so that a test user is aware of proper procedures.

Personality Assessment Instruments

Projective Techniques

Rorschach Inkblot Test

The Rorschach Inkblot Method is a projective, performance-based measure of personality that involves the interpretation of a test taker’s responses to inkblots, or, more precisely, to the question “What might this be?” Interpretations of a test taker’s responses to the inkblots are used to make inferences about her or his cognitive style, interpersonal skills, motivations, and other important aspects of personality.

In the early 20th century, while working at a mental hospital, Hermann Rorschach investigated the idea that people with different mental disorders would respond differently to inkblots compared to people with other mental disorders and nonclinical populations. To test this notion, Rorschach created and experimented with inkblots he created himself. Over time, he selected a group of inkblots that he deemed effective at eliciting responses and exposing individual differences in responses (Weiner & Greene, 2008). Rorschach then administered the selected inkblots to 288 hospital patients and 117 non-patients. The original materials and methods used by Rorschach during these early phases of testing provided the foundation for Rorschach assessment. Additionally, the same 10 inkblots first selected and published by Rorschach are still in use in the early 21st century. Interest in using the inkblot method grew across the world, and had reached the United States by the end of the 1920s (Weiner & Greene, 2008). From that time, many psychologists began to include the Rorschach as part of their assessment battery and developed their own systems for administering and scoring the test.

The Rorschach Inkblot Method (RIM) was widely used through the latter part of the 20th century; however, more recent surveys have shown a substantial decline in its application. Use of the RIM assumes that when confronted with an ambiguous stimulus like an inkblot, an individual must draw on their own personal experiences, perceptions, and ideas to respond to the question “What might this be?” This response, in turn, is thought to be indicative of how they would judge, organize, and respond to ambiguous stimuli in real-life situations (Groth-Marnat & Wright, 2016). Initially, information about the psychometric properties of the RIM was difficult to gather because of the vast array of administration and scoring methods developed by various authors (Beck, 1930; Klopfer & Kelley, 1946). However, Exner (1974) sought to standardize the RIM by publishing the Comprehensive System, integrating previous research into a standardized set of administration and scoring guidelines. This system underwent multiple revisions and became the most commonly used method for administration and scoring of the RIM.

The most recent approach to administration and scoring of the Rorschach is the Rorschach Performance Assessment System (R-PAS; Meyer, Viglione, Mihura, Erard, & Erdberg, 2011). The R-PAS was developed to be an evidence-based approach to using the Rorschach method. Some goals for the development of the R-PAS were to select variables with the strongest empirical and clinical support while eliminating variables with insufficient support, develop and revise indices using modern statistical approaches, provide a simplified system of terminology and calculations, provide empirical evidence and psychological rationale for each interpreted score, provide a statistical procedure for adjusting for the complexity of the record and its impact on each variable, optimize the number of responses to lead to an interpretable and meaningful protocol, compare test takers scores to a large reference sample, and offer access to a scoring program through an online-based platform. Through these goals, the development of the R-PAS was used as an initiative to increase the reliability, validity, and utility of the Rorschach method of personality assessment (Meyer et al., 2011; Meyer & Eblin, 2012).

The RIM is most commonly used in clinical settings. One rationale for using a performance-based measure like the RIM is that test takers may not be aware of or recognize certain personality characteristics of themselves or may not want to admit them (Egloff, Schwerdtfeger, & Schmukle, 2005). Performance-based measures are intended to provide insight into internal behavioral and problem-solving biases that may not be captured when directly asking the individual about these topics. The RIM has been used to aid clinicians in decision making about differential diagnoses, personality characteristics, treatment planning, and outcome evaluation (Exner, 2003; Hartmann, Nørbech, & Grønnerød, 2006) Whereas the Rorschach can be used to aid in identifying personality styles or psychopathology diagnoses, it is less useful at the symptom level. It may help to identify someone presenting with an anxious, negative affect, but it would not help to identify specific symptoms like disturbed sleep or muscle tension. The Rorschach provides information on personality style, level of distress, and coping capacity, all of which may be important factors during treatment planning. Clinicians can use these factors to make decisions about the location and intensity of care, goals of treatment, and the order of treatment (Weiner, 2004).

The utility of the Rorschach method has been a topic of much debate. Some psychologists have commented on the psychometrics of the Rorschach, and in 2003, Wood, Nezworski, Lilienfeld, and Garb (2003) raised the issues of the utility of the measure. Wood and colleagues argued that the test often over-pathologizes clients, even indicating the presence of psychopathology in normal individuals. Additionally, they argue the method does not add to or improve the diagnostic accuracy of clinicians. They posit that scores on the Rorschach lack acceptable levels of reliability and validity. Furthermore, the administration and scoring procedures can create measurement problems that may lead clinicians to make inferences that lack empirical support and lead to serious consequences. Despite these concerns being raised, the Rorschach continues to be taught in graduate psychology programs (albeit to a lesser extent) and used in clinical practice. Because of this, some psychologists have suggested guidelines for using projective techniques effectively and appropriately in clinical settings (Garb, Lilienfeld, Wood, & Nezworski, 2002).

Thematic Apperception Test

The Thematic Apperception Test (TAT) was developed by Henry Murray and Christina Morgan in 1935 (Morgan & Murray, 1935). Murray took an idiographic approach to studying personality, with an emphasis on individual differences. In this approach, he believed one’s personality was an interaction of “needs,” the motivational forces from within a person, and “presses,” environmental forces that affect how a person expresses their needs (Weiner & Greene, 2008).

In the early 1930s, Murray became interested in the idea that stories told by people can reveal parts of how they think and feel. He thought that carefully chosen pictures could serve as a stimulus to elicit these stories from people. With Morgan, he selected 20 pictures that they thought would exemplify a critical situation or contain a character that would be relatable to the examinee. These pictures would be used in the original version of the TAT (Morgan & Murray, 1935; Weiner & Greene, 2008). The TAT involves a test taker creating stories about events depicted in a series of pictures. This test was designed to evaluate the individual’s patterns of thinking, beliefs, attitudes, emotional reactions, and general observational abilities. Interpretation of the TAT generally assumes that the examinee identifies with a protagonist character in the pictures, and projects her or his own needs and feelings onto their description of the character and scene. Any thoughts, feelings, or behaviors avoided by the characters may represent the examinee’s own conflicts (Gregory, 2004).

The TAT is comprised of 30 black-and-white pictures that represent a variety of themes and situations. Most pictures involve at least one or more people that are engaged in an ambiguous task. Cards can be used with girls (G), boys (B), adult females (F), adult males (M), or some combination (e.g., GF). When administering the TAT, the examiner asks the examinee to make up a story about each picture by describing what led up to the scene, what is currently happening in the scene, what the characters may be thinking or feeling, and what the outcome of the situation may be. The examiner records the story to later be scored and interpreted (Gregory, 2004).

Although Murray (1943) established administration guidelines for the TAT, various administration, scoring, and interpretation procedures were published by different authors in the immediate years following publication of the TAT and beyond. The absence of standardized procedures remains a problem in the early 21st century, with clinicians varying instructions, selecting varying subsets of cards to administer to their clients, and relying on their own preferred interpretive guidelines (Gregory, 2004). To address these issues, Westen, Lohr, Silk, Kerber, and Goodrich (1989) developed the Social Cognition and Object Relations Scale (SCORS) to code TAT pictures in a manner that would elicit a person’s underlying attitudes toward themselves, other people, and social relationships. The SCORS contains eight dimensions of object relationships that are coded on a 7-point scale reflecting the maturity level in the actions and attitudes of characters in a response story. The SCORS dimensions are complexity of representation of people, affect quality of relationships, emotional investment in relationships, emotional investment in values and moral standards, understanding of social causality, experience and management of aggressive impulses, self-esteem, and identity and coherence of oneself. Despite the development of the SCORS, the procedure has not been widely adopted, and the TAT is often employed using individual scoring and interpretation strategies (Weiner & Greene, 2008). Because of this, the psychometric properties of the TAT have been difficult to evaluate and establish.

Self-Report Measures

Minnesota Multiphasic Personality Inventory

The Minnesota Multiphasic Personality Inventory (MMPI) was published in 1943 by Stark Hathaway and J. Charnley McKinley of the University of Minnesota Hospital. The MMPI was developed to be a screening instrument to detect and differentially diagnose psychopathology. Test items were selected by examining the psychiatric methods of the time, gathering information from psychiatric examination direction forms, textbooks, directions for case taking in medicine, and previously published scales of social and personal attitudes (Ben-Porath, 2012; Hathaway & McKinley, 1940). Hathaway and McKinley developed the MMPI Clinical Scales from the selected items based on the diagnostic classification system of the 1930s. The scales were developed by contrasting responses of patient samples that were differentially diagnosed with the responses of a nonclinical sample. Items were selected if they were judge to satisfactorily differentiate the various diagnostic samples from the nonclinical sample. The nonclinical sample’s responses were used to develop the norms for the MMPI. The Clinical Scale raw scores were calculated by counting the items the examinee answered in the direction more often answered by a targeted disorder group than by the normative sample.

In the years following the development of the MMPI, attempts to replicate the validity of the Clinical Scales to predict diagnostic group membership were only slightly successful to unsuccessful for some of the scales. Because of this, the MMPI underwent a transformation that shifted its use from differential diagnosis to a broader application of describing normal and abnormal personality characteristics. Leading to this shift, MMPI users observed that in certain settings, test takers who shared clinical characteristics seemed to produce a similar pattern of scores. This observation led to the development of coding systems and code types to designate different classes of score profiles on the MMPI. These code types presented a convenient way to summarize the patterns of scores on an examinee’s profile, rather than using individual scale scores as the primary source of test information. Through research, a literature was amassed that led to the creation of codebooks and tables that presented the associated empirical correlates for the various code types of the MMPI. These codebooks were used to increase the clinical utility of the MMPI (Ben-Porath, 2012).

The MMPI Validity Scales were developed to assess how an examinee approached the test. Although the Clinical Scales were constructed with little attention paid to item content, the development of the Validity Scales acknowledged that test takers may respond to the item content in misleading ways. The Validity Scales were developed to account for personality tests’ susceptibility to conscious or unconscious deception by the examinee (Ben-Porath, 2012).

The MMPI was one of the most widely used self-report measures of personality and psychopathology, and by the 1970s, it was the most studied assessment instrument (Ben-Porath, 2012). Despite its popularity, critics of the MMPI had identified some fundamental problems with the instrument. One of the main concerns was that the MMPI was developed based on a dated psychopathology classification system. Additionally, critics were concerned with the lack of evidence supporting the configural scoring and interpretation methods involving the code types. Furthermore, among the Clinical Scales, there were excessive intercorrelations caused by a shared general factor, item overlap, and excessive heterogeneity saturated with invalid subtle items (Ben-Porath, 2012). Other concerns focused on the increasingly dated and narrowly focused normative sample, which matched closely the originally targeted population of 1940s University of Minnesota Hospital patients, but was no longer representative of the much wider population with which the test came to be used (Ben-Porath, 2012).

The MMPI was revised and published in 1989 as the MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989). The MMPI-2 was standardized on a more representative and then contemporary normative sample. The revision process involved deleting obsolete items, the continuation of the original validity and Clinical Scales, using a more representative normative sample, and developing new scales. The MMPI-2 consists of 567 items, comprised of essentially the same Clinical Scales as the original MMPI.

In 2003, Tellegen and colleagues published the MMPI-2 Restructured Clinical (RC) Scales, which were designed to measure the major distinctive core components of the Clinical Scales more effectively (see Table 1). The RC Scales were developed to address the high interscale correlations, item overlap, and over-inclusive item content on the Clinical Scales (Tellegen et al., 2003). Demoralization had been known to be a characteristic shared by most patients, regardless of diagnosis, and was a shared component of all the Clinical Scales (Ben-Porath, 2012; Frank, 1974; Tellegen et al., 2003), thus, reducing their discriminant validity. The RC scales were developed by removing, to the extent necessary and possible, the demoralization component from all eight original MMPI Clinical Scales and identifying the remaining mutually distinctive, core constructs assessed by each (Ben-Porath, 2012; Tellegen et al., 2003). RC Scale research reviewed by Ben-Porath (2012) indicates that the revised measures show comparable to improved reliability when compared with the original Clinical Scales, decreased interscale correlations, comparable to improved convergent validity and considerably improved discriminant validity.

Using a similar approach, Ben-Porath and Tellegen (2008/2011) developed the full Minnesota Multiphasic Personality Inventory-2 Restructured Form (MMPI-2-RF) using the MMPI-2 item pool. The MMPI-2-RF provides a dimensional approach to assessment of personality and psychopathology, which is consistent with current developments in the field (Kotov et al., 2017). Personality and psychopathology are assessed from a hierarchical perspective that includes broad domains as well as more narrowly focused constructs.

The MMPI-2-RF is comprised of 338 items scored on 51 scales: 9 validity scales and 42 substantive scales (see Table 1). The 9 Validity Scales assess threats to protocol validity and the interpretability of a protocol. The 42 Substantive Scales are comprised of 3 Higher-Order Scales, the 9 RC scales, 23 Specific Problem Scales, 2 Interest Scales, and 5 Psychopathology-Personality PSY-5 Scales (Ben-Porath & Tellegen, 2008/2011; Harkness & McNulty, 2006).

Two types of threats to protocol validity are assessed by the MMPI-2-RF Validity Scales: non-content-based invalid responding and content-based invalid responding. Non-content-based invalid responding can be divided into three subtypes: nonresponding, random responding, and fixed responding. Content-based invalid responding can be divided into two subtypes: overreporting and underreporting.

Non-content-based responding results from an examinee not comprehending or accurately reading test items. If an examinee’s responses are not an accurate reflection of their reaction to the test items, then they cannot be used to accurately assess their psychological functioning or dysfunction. Nonresponding occurs when an examinee provides a nonscorable response to a test item. This may be in the form of no answer or responding both “true” and “false” to a test item. Random responding is an unsystematic response pattern that reflects inaccurate reading or incomprehension of test items. Intentional random responding may occur when an examinee can read the test items, but still responds in an unsystematic pattern. Unintentional random responding can occur when an examinee responds in an unsystematic pattern because they cannot understand the test item content. Fixed responding occurs when an examinee responds to test items in a systematic way, either indiscriminately “true” or indiscriminately “false” in a manner that is not based on her or his reading or comprehension of the test items.

Content-based invalid responding occurs when an examinee responds to test items in a way that creates an inaccurate impression of themselves. Overreporting occurs when an examinee exaggerates the severity of, or reports having difficulties they do not really have. Intentional overreporting occurs when an examinee purposely exaggerates their symptoms to seem more dysfunctional. This may be motivated by some external benefits (e.g., malingering) or may represent someone with genuine psychopathology who is magnifying the intensity of their symptoms. Unintentional overreporting occurs when an examinee is unaware that they are presenting their symptoms in an exaggerated manner. This may occur if an examinee has a skewed self-awareness and does not realize they are inaccurately reporting their symptoms. Underreporting occurs when an examinee presents as having fewer difficulties than he or she is experiencing. Intentional underreporting occurs when an examinee purposely denies or downplays their symptoms to appear to be functioning at a higher level. Unintentional underreporting occurs when an examinee unknowingly denies or downplays their symptoms owing to lack of awareness.

Non-content-based invalid responding is assessed by the Cannot Say (CNS), Variable Response Inconsistency (VRIN-r), and True Response Inconsistency (TRIN-r) Validity Scales. The CNS Scale is a measure of the number of unscorable responses. VRIN-r is a measure of inconsistent responses (e.g., true-false) to test item pairs that are similar in content and should be answered in similar directions (e.g., true-true). TRIN-r is a measure of consistent responses to test item pairs that are reversals in content. TRIN-r scores can reflect either fixed true responding or fixed false responding (Ben-Porath, 2012).

Content-based invalid responding is assessed by the Infrequent Responses (F-r), Infrequent Psychopathology Responses (Fp-r), Infrequent Somatic Responses (F-s), Symptom Validity (FBS-r), Response Bias (RBS), Uncommon Virtues (L-r), and Adjustment Validity (K-r) scales. F-r consists of items rarely answered in the keyed direction by members of the MMPI-2-RF normative sample. Fp-r consists of items rarely answered in the keyed direction by people with genuine psychopathology. Higher scores on F-r and Fp-r may be indicative of inconsistent responding, overreporting, or the presence of significant psychopathology. Inconsistent responding can be ruled out with the aid of the VRIN-r and TRIN-r scales. If it is, the interpreter must consider collateral, extra-test information in determining whether a test taker is sufficiently disordered to justify elevated scores on the scales. Fs consists of items that are infrequently endorsed by medical patients with various diseases. Higher scores on Fs may indicate inconsistent responding and overreporting of somatic symptoms, or they may reflect a significant medical condition. Here too, scores on the inconsistency scales and collateral information need to be considered when interpreting Fs scale scores. FBS-r consists of items that can identify someone presenting with non-credible symptoms. Higher scores on FBS-r may indicate inconsistent responding and overreporting, and may reflect someone with significant medical conditions or somatic complaints. Here again, consistency scales scores and collateral information must be considered when making these different inferences. RBS measures negative response bias in forensic evaluations, containing items selected from a sample of disability claimants and personal injury litigants. Higher scores on RBS may indicate inconsistent responding and overreporting of memory complaints. L-r consists of items that assist in identifying test takers who deny minor shortcomings to present themselves in a favorable way. K-r consists of items that assist in identifying examinees who are presenting themselves as unrealistically well adjusted. Higher scores on L-r and K-r may indicate inconsistent responding and the underreporting of symptoms as well (Ben-Porath, 2012).

The MMPI-2-RF contains three higher-order scales designed to measure broad domains of psychopathology: Emotional/Internalizing Dysfunction (EID), Thought Dysfunction (THD), and Behavioral/Externalizing Dysfunction (BXD). EID scores represent an overall gauge of an examinee’s emotional functioning by assessing a broad range of emotional and internalizing problems. Lower scores on EID represent a low level of emotional difficulties, and higher scores are indicative of considerable emotional distress and problems. THD scores represent an overall gauge of an examinee’s level of reported thought dysfunction by assessing a range of symptoms related to disordered thinking. Higher scores on THD are indicative of symptoms related to serious thought dysfunction. BXD scores represent a gauge of an examinee’s tendency to act out by assessing a range of behavioral problems. Low scores on BXD represent higher levels of behavioral restraint, while high scores on BXD represent high levels of acting out and under-controlled behaviors (Ben-Porath, 2012).

The 9 RC Scales are measures of Demoralization (RCd), Somatic Complaints (RC1), Low Positive Emotions (RC2), Cynicism (RC3), Antisocial Behavior (RC4), Ideas of Persecution (RC6), Dysfunctional Negative Emotions (RC7), Aberrant Experiences (RC8), and Hypomanic Activation (RC9). The 23 Specific Problems scales are divided into 5 somatic scales, 9 internalizing scales, 4 externalizing scales, and 5 interpersonal scales. The PSY-5 scales are modeled similarly to the emerging model of personality disorders outlined in Section III of the DSM-5 (Anderson, Snider, Sellbom, Krueger, & Hopwood, 2014). Specifically, the scales measure Aggressiveness, Psychoticism, Disconstraint, Negative Emotionality/Neuroticism, and Introversion/Low Positive Emotionality (see Table 1; Ben-Porath, 2012; Ben-Porath & Tellegen, 2008/2011).

The MMPI-2-RF is used in a variety of settings and with various populations. These include mental health, medical, forensic, correctional, police and public safety, and other nonclinical settings. In mental health and psychiatric settings, the MMPI-2-RF may be used to provide information beyond the presence of psychopathology, such as predicting premature treatment termination (Anestis, Gottfried, & Joiner, 2015). In medical settings, the MMPI-2-RF may be used as a presurgical screener to predict surgical outcome and adherence (Block, Marek, Ben-Porath, & Kukal, 2017; Marek, Ben-Porath, Van Dulmen, Ashton, & Heinberg, 2017). The MMP-2-RF can be used to assess police officer candidates (Tarescavage, Corey, Gupton, & Ben-Porath, 2015). The MMPI-2-RF is widely used in correctional and forensic settings. For example, it may be used to detect facets of psychopathy, predict future suicidal behaviors, and assess the credibility of and detect biased symptom reporting (i.e., overreporting or underreporting) (Glassmire, Tarescavage, Martinez, Gomez, & Burchett, 2016; Phillips, Sellbom, Ben-Porath, & Patrick, 2014; Sellbom et al., 2016; Wall, Wygant, & Gallagher, 2015). Additionally, research has shown strong psychometric properties for the MMPI-2-RF in culturally diverse samples (Shkalim, 2015).

Personality Assessment Inventory

The Personality Assessment Inventory (PAI) is a self-report measure of personality authored by Leslie Morey (Morey, 1991, 2007). The development of the PAI was based on a construct validation framework in which items were developed and selected based on their conceptual content and empirical sufficiency. Each scale of the PAI was designed to measure a specific construct, which does not overlap with other scales and provides clinically relevant information (Morey, 2007). Item content was developed after reviewing relevant literature, surveying test takers, and incorporating the leading diagnostic schemas of the time. The content areas were defined, and from there, the scales and subscales were selected. There were initially 2,200 items that were reduced to 1,086 items after being rated by research teams. Expert judges and an external bias panel then reviewed and reduced the item pool to 776 items to be analyzed. After deleting and revising the 776 items, 597 items were retained for a beta version of the test. The best version was tested, and reliability and validity analyses conducted, leading to 344 items being retained for the final version of the PAI. The test items are answered in a four-choice format with the options of False, Not at all True (F), Slightly True (ST), Mainly True (MT), or Very True (VT). These tests include 22 nonoverlapping scales: 4 Validity Scales, 11 Clinical Scales, 5 Treatment Consideration Scales, and 2 Interpersonal Scales (see Table 2; Morey, 2007).

The PAI has four validity scales. The Inconsistency scale identifies inconsistent responding to 10 pairs of items with similar content. The Infrequency Scale includes items that should be answered similarly by all respondents regardless of their clinical status. The Negative Impression Scale is used to identify take takers who are overreporting. The Positive Impression Scale is used to detect underreporting.

The PAI includes 11 clinical scales that are designed to measure content directly related to their labels. Each has associated subscales. The Somatic Complaints Clinical Scale has subscales that are measures of conversion, somatization, and health concerns. The Anxiety Clinical Scale has subscales that are measures of cognitive, affective, and physiological aspects of anxiety. The Anxiety-Related Disorders Clinical Scale has subscales that measure obsessive-compulsive symptoms, phobias, and symptoms related to traumatic events. The Depression Clinical Scale has subscales that are measures of cognitive, affective, and physiological aspects of depression. The Mania Clinical Scale has subscales that are measures of activity level, grandiosity, and irritability. The Paranoia Clinical Scale has subscales that are measures of hypervigilance, persecution, and resentment. The Schizophrenia Clinical Scale has subscales that are measures of psychotic experiences, social detachment, and thought disorder. The Borderline Features Clinical Scale has subscales that are measures of affective instability, identity problems, negative relationships, and self-harm. The Antisocial Features Clinical Scale has subscales that are measures of antisocial behaviors, egocentricity, and stimulus seeking. The Alcohol Problems and Drug Problems Clinical Scales are measures of behaviors and consequences related to alcohol or drug use, abuse, and dependence (see Table 2; Morey, 2007).

The PAI contains five Treatment Consideration Scales. The Aggression Scale has subscales that measure aggressive attitudes, verbal aggression, and physical aggression. The Suicidal Ideation Scale assesses thoughts and ideas related to death and suicide. The Stress Scale assesses current or recent stressful experiences. The Nonsupport Scale measures a client’s perceived lack of social support. The Treatment Rejection Scale measures attributes and attitudes associated with changes in a client’s psychological or emotional status, relevant to high or low motivations for treatment (Morey, 2007). The test includes two Interpersonal scales. The Dominance Scale is a measure of a client’s tendency to be controlling, submissive, or autonomous in interpersonal relationships. The Warmth Scale is a measure of a test taker’s tendency to be empathic and engaging versus mistrusting and rejecting in interpersonal relationships (see Table 2; Morey, 2007).

The PAI has demonstrated adequate to good validity and reliability for both individual and overall scales and subscales. This is reflective of the goal to develop a measure with adequate psychometric properties, a connection to the dominant diagnostic framework, and clinical utility. The subscales tend to have lower psychometric strength than the overall scales, but they are clinically useful when identifying more specific aspects of disorders. PAI interpretation involves assessment of protocol validity, interpreting full scales and subscale scores, and interpreting configurations of full and subscales in clinically meaningful ways. Responses to critical items can also be considered. The PAI can be used in a variety of contexts: mental health inpatient and outpatient clinical assessments and forensic evaluations. The PAI is also used in public safety evaluations, with a public safety interpretive report available to clinicians (Roberts, Thompson, & Johnson, 2000). Research has supported using the PAI with culturally diverse populations (Alamilla & Wojcik, 2013; Cheung et al., 2006; Fernandez, Boccaccini, & Noland, 2008; Groth-Marnat & Wright, 2016; Morey, 2007).

NEO Personality Inventory

The NEO Personality Inventory Third Edition (NEO-PI-3; McCrae & Costa, 2010) is a measure of dimensional normal personality traits based on the Five-Factor Model (FFM) of personality. The five personality dimensions are Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness, with people’s individual personality traits falling somewhere between the low and high range of each dimension. The Neuroticism Scale is a measure of the tendency to experience distress in the form of a variety of negative emotions, such as sadness, anxiety, guilt, and self-consciousness. The Extraversion Scale is a measure of the degree to which someone is outgoing, sociable, and assertive, with a propensity to experience positive emotions. The Openness to Experience Scale is a measure of different personality traits, including curiosity, imagination, and attunement to emotions. The Agreeableness Scale is a measure of attitudes about the goodness and trustworthiness of others and behaviors related to the empathy and respect of others. The Conscientiousness Scale is a measure of traits and behaviors related to orientation, determination, and ability to accomplish things through organization, planning, and an awareness of consequences. Within these broad scales, the NEO also includes 30 facet scales, with six for each of the five domains (see Table 3; McCrae & Costa, 2010).

Costa and McCrea (1985) first developed the NEO to measure enduring personality traits found in the general population. The first version was published in 1978 and only included the domains of Neuroticism, Extraversion, and Openness. They developed the test using a hierarchical structure with the overall domains with component facets. In 1987, the domains of Agreeableness and Conscientiousness were added to the NEO PI. The full NEO PI-R was published in 1992 and contained the five domains in addition to six facet scales for each domain (Costa & McCrea, 1992).

Reliability and validity studies of the NEO show that it compares favorably with other personality measures. The test has been widely studied, and it is used in a variety of settings although typically not in clinical assessments. Additionally, the five-factor structure found in the United States has been demonstrated to consistently emerge across other cultures such as Belgian, Hungarian, and Korean cultures (Costa & McCrea, 1992; De Fruyt, McCrae, Szirmak, & Nagy, 2004; Groth-Marnat & Wright, 2016; Yoon, Schmidt, & Ilies, 2002). Critics have noted that the absence of validity scales limits the utility of the NEO instruments in applied settings (Ben-Porath & Waller, 1992; Waller & Ben-Porath, 1987).


Personality assessments take place in a variety of contexts such as mental health/psychiatric, medical, legal and forensic, academic/educational, and public safety settings. In mental health and psychiatric settings, personality assessments can provide substantial information beyond a formal diagnosis. This includes information about the type and level of care needed, appropriate activities, appropriateness and type of recommended psychotherapy, and identification of potential treatment barriers or future problems.

Personality assessments also provide relevant information in medical settings. A physician may request an assessment to gain more information about a patient who may have a psychological disorder, emotional distress because of medical problems, neuropsychological difficulties, chronic pain, chemical dependency, and case consultation (Groth-Marnat & Wright, 2016). The results of personality assessment can aid in treatment considerations for any psychological or psychosocial difficulties that are affecting or stemming from medical problems. Additionally, personality assessments may be used as presurgical screeners to assess for the likelihood of a negative reaction to surgery or as a predictor of surgical outcome and adherence (Block, Marek, Ben-Porath, & Kukal, 2017; Marek et al., 2013; Marek, Block, & Ben-Porath, 2015; Martinez, Fernandez del Rio, Lopez-Duran, & Becona, 2017).

There has been a steady increase in the use of psychological assessments in forensic and legal settings over the past few decades. Personality assessments may be used as part of an evaluation of competency to stand trial, in connection with insanity pleas, and in assessments conducted in the context of personal injury or child custody litigation, for example. In these contexts, a personality assessment measure can provide relevant information that can be used to address the forensic referral question. This may include information about an examinee’s behavioral tendencies, personality characteristics, and psychological functioning. Some measures may also provide information about an examinee’s approach to the forensic evaluation. For example, the MMPI-2-RF Validity Scales may provide information about an examinee who is overreporting or underreporting symptoms. This is especially relevant in forensic settings, where there are often incentives to over- or underreport symptoms (Archer & Wheeler, 2013).

As just noted, personality assessments are conducted in multiple settings and with a variety of populations. Because of the varied contexts in which assessments occur, special considerations are required when working with diverse groups. Clinicians should consider the age of a client and select instruments that have been validated and standardized for that client’s age group. An instrument should use age-appropriate language and content so that a client can understand and accurately respond to items. Language proficiency is also an important consideration when working with diverse clients. Clinicians should also consider cultural competency when working with diverse clients. This is important for establishing a trusting and respectful relationship with the client as well as interpreting behavioral and interpersonal observations. When working with diverse clients, an assessment instrument should be carefully selected after considering test equivalency and cultural appropriateness to avoid biased assessment. Some considerations include the client’s acculturation, language proficiency, translations of the instrument, the instrument’s norms, the cultural context of the construct(s) of interest, and any possible alternative tests. When determining the cultural equivalency of a test, one must consider linguistic, metric, and conceptual equivalencies. Linguistic equivalence determines whether a test has been accurately translated and deals with the instrument’s wording and content. Conceptual equivalency determines if the construct has the same meaning across cultures. Metric equivalence determines if the instrument has similar psychometric properties across cultures (Groth-Marnat & Wright, 2016). A clinician should consider aspects of both the client and instrument before beginning an assessment to select the most appropriate and useful instrument to best address the referral question.

Future Directions

As the field of personality assessment progresses, psychologists will likely continue to update current measures and develop new ones. To be useful and as accurate as possible, these measures should be continuously updated to reflect the current conceptualization of personality and psychopathology. This may include updating the language and content of items, updating psychometric and normative data, adapting procedures to include technological advances, and including cultural considerations. As the use of personality assessment expands into more settings, measures should be validated and standardized as needed. Researchers may also consider adapting administration procedures to best meet the needs of these varied settings. For example, shorter measures and administration times may be necessary in medical settings where there is less time for the assessment. The domain of personality assessment has been built upon years of research. It is a dynamic field where researchers continue to monitor and examine clinical findings, while clinicians adapt their procedures to take advantage of these advances.

Table 1. MMPI-2-RF Scales

Validity Scales


Variable Response Inconsistency – Random responding


True Response Inconsistency – Fixed responding


Infrequent Responses – Responses infrequent in the general population


Infrequent Psychopathology Responses – Responses infrequent in psychiatric populations


Infrequent Somatic Responses – Somatic complaints associated at high levels with overreporting


Symptom Validity – Somatic and cognitive complaints infrequent in medical populations


Uncommon Virtues – Rarely claimed moral attributes or activities


Adjustment Validity – Avowals of good psychological adjustment associated at high levels with underreporting

Higher-Order (H-O) Scales


Emotional/Internalizing Dysfunction – Problems associated with mood and affect


Thought Dysfunction – Problems associated with disordered thinking


Behavioral/Externalizing Dysfunction – Problems associated with under-controlled behavior

Restructured Clinical (RC) Scales


Demoralization – general unhappiness and dissatisfaction


Somatic Complaints – Diffuse physical health complaints


Low Positive Emotions – Lack of positive emotional responsiveness


Cynicism – Non-self-referential beliefs expressing distrust and a generally low opinion of others


Antisocial Behavior – Rule breaking and irresponsible behavior


Ideas of Persecution – Self-referential beliefs that others pose a threat


Dysfunctional Negative Emotions – Maladaptive anxiety, anger, irritability


Aberrant Experiences – Unusual perceptions or thoughts


Hypomanic Activation – Over-activation, aggression, impulsivity, and grandiosity

Specific Problems (SP) Scales

Somatic Scales


Malaise – Overall sense of physical debilitation, poor health


Gastrointestinal Complaints – Nausea, recurring upset stomach, and poor appetite


Head Pain Complaints – Head and neck pain


Cognitive Complaints – Memory problems, difficulties concentrating

Internalizing Scales


Suicidal/Death Ideation – Direct reports of suicidal ideation and recent suicide attempts


Helplessness/Hopelessness – Belief that goals cannot be reached or problems solved


Self-doubt – Lack of confidence, feelings of uselessness


Inefficacy – Belief that one is indecisive and inefficacious


Stress/Worry – Preoccupation with disappointments, difficulty with time pressure


Anxiety – Pervasive anxiety, frights, frequent nightmares


Behavior-Restricting Fears – Fears that significantly inhibit normal activities


Multiple Specific Fears – Fears of blood, fire, thunder, etc.

Externalizing Scales


Juvenile Conduct Problems – Difficulties at school and at home, stealing


Substance Abuse – Current and past misuse of alcohol and drugs


Aggression – Physically aggressive, violent behavior


Activation – Heightened excitation and energy level

Interpersonal Scales


Family Problems – Conflictual family relationships


Interpersonal Passivity – Being unassertive and submissive


Social Avoidance – Avoiding or not enjoying social events


Shyness – Bashful, prone to feel inhibited and anxious around others


Disaffiliativeness – Disliking people and being around them

Interest Scales


Aesthetic-Literary Interests – Literature, music, the theater


Mechanical-Physical Interests – Fixing and building things, the outdoors, sports

Personality Psychopathology Five (PSY-5) Scales


Aggressiveness-Revised – Instrumental, goal-directed aggression


Psychoticism-Revised – Disconnection from reality


Disconstraint-Revised – Under-controlled Behavior


Negative Emotionality/Neuroticism-Revised – Anxiety, insecurity, worry, and fear


Introversion/Low Positive Emotionality-Revised – Social disengagement and anhedonia

Source: (Ben-Porath & Tellegen, 2008/2011)

Table 2. PAI Scales

Validity Scales


Inconsistency – degree to which pairs of items with similar content are rated consistently or inconsistently by the client


Infrequency – items that should be answered similarly by all respondents regardless of their clinical status (e.g., confusion, carelessness, reading difficulties, or other sources of random responding)


Negative Impression – items that present an exaggerated, unfavorable impression or represent bizarre and unlikely symptoms


Positive Impression – items that involved the presentation of a very favorable impression or the denial of relatively minor faults

Clinical Scales and Subscales


Somatic Complaints – items that reflect concerns about physical functioning and health matters


Conversion – items correspond to dramatic physiological typical of conversion disorders, particularly unusual sensorimotor problems


Somatization – items address routine physical complaints (e.g., headaches, back problems, or gastrointestinal problems)


Health Concerns – items address a preoccupation with health and physical functioning


Anxiety – measures the degree of tension and negative affect experienced by the respondent across different diagnostic categories


Cognitive – expectation of harm, ruminative worry, and cognitive beliefs centering on an ideational vigilance to potential danger


Affective – feelings of tension, apprehension, and nervousness


Physiological – somatic expression of anxiety, particularly autonomic nervous system features (e.g., sweaty palms, racing heart, or rapid breathing)


Anxiety-Related Disorders – clinical and behavioral features of three areas of symptomology related to specific anxiety disorders


Obsessive-Compulsive – symptomatic features of the disorder (e.g., fears of contamination and performance of rituals) and personality elements of the disorder (e.g., perfectionism and hyperattentiveness to detail)


Phobias – common phobic fears (e.g., heights, enclosed places, and public transportation)


Traumatic Stress – reactions to traumatic stressors (e.g., nightmares, sudden anxiety reactions, and feeling of being irreversibly changed by a traumatic event)


Depression – clinical features common to the syndrome of depression


Cognitive – expectancies or beliefs of one’s inadequacy, powerlessness, or helplessness in dealing with the demands of the environment


Affective – experience of feeling distressed, unhappy, sad, and a loss of interest in normal activities


Physiological – vegetative signs of depression (e.g., sleep problems, appetite problems, and lack of energy)


Mania – clinical presentation of mania and hypomania


Activity Level – activity levels of the individual in the ideational (i.e., flight of ideas) and behavioral realms (i.e., motor activity)


Grandiosity – person’s self-evaluation of many talents and abilities


Irritability – volatile irritability that reflects a certain degree of ambition in combination with low frustration tolerance


Paranoia – symptoms and enduring characteristics of paranoia


Hypervigilance – predisposition to distrust people that are not known well


Persecution – beliefs that others are attempting to obstruct or impede the respondent’s efforts


Resentment – hostility and bitterness


Schizophrenia – measures a number of the facets of schizophrenia


Psychotic Experiences – various positive experiences of schizophrenia (e.g., unusual perceptions and magical thinking)


Social Detachment – social disinterest and lack of affective responsivity


Thought Disorder – range of clarity and freedom from confusion in thought processes


Borderline Features – elements related to severe personality disorder


Affective Instability – propensity to alternate rapidly between various negative affects (e.g., anxious, angry, depressed, and irritable)


Identity Problems – difficulties in maintaining a constant representation of self-identity, typically indicated by sudden shifts in ambitions and goals


Negative Relationships – tendency to become involved in relationships that are very intense and chaotic


Self-Harm – tendency to act impulsively without much attention paid to the consequences of the often self-damaging or self-destructive acts


Antisocial Features – personality and behavioral features relevant to the constructs of antisocial personality and psychopathy


Antisocial Behaviors – antisocial acts during adolescence and adulthood


Egocentricity – callousness and lack of empathy in interactions with others


Stimulus Seeking – willingness to take risk and a desire for novelty


Alcohol Problems – behaviors and consequences related to alcohol use, abuse, and dependence


Drug Problems – behaviors and consequences related to drug use, abuse, and dependence

Treatment Consideration Scales


Aggression – attitudinal and behavioral features relevant to aggression, anger, and hostility


Aggressive Attitude – general affects and attitudes conducive to anger-proneness, particularly the tendency to become easily frustrated or irritated


Verbal Aggression – readiness to display anger through verbal interactions with others


Physical Aggression – past history and present attitudes toward physically aggressive behavior


Suicidal Ideation – thoughts and ideas related to death and suicide


Stress – life stressors that the client is currently experiencing or has recently experienced


Nonsupport – perceived lack of social support


Treatment Rejection – attributes and attitudes associated with an interest in personal changes of a psychological or emotional nature

Interpersonal Scales


Dominance – the extent to which a person is controlling, submissive, or autonomous in interpersonal relationships


Warmth – the extent to which an individual is empathic and engaging versus withdrawing, rejecting, and mistrustful in interpersonal relationships

Source: (Morey, 2007)

Table 3. NEO-PI-3 Scales


Tendency toward general distress and experiencing negative affects

N1: Anxiety

Fearful in general, a proneness toward being tense and nervous

N2: Angry Hostility

Tendency toward anger, bitterness, and frustration

N3: Depression

Likelihood of experiencing a range of depressive affects (e.g., sadness, hopelessness, and shame)

N4: Self-consciousness

Discomfort with social awkwardness, focused on feelings of shame and embarrassment

N5: Impulsiveness

Inability to control urges

N6: Vulnerability

Perceived capability to cope with stress


Measure of outgoingness, sociability, and assertiveness

E1: Warmth

Comfort with closeness and interpersonal intimacy; likely to be affectionate and friendly

E2: Gregariousness

Preference for having other people around

E3: Assertiveness

Tendency to be dominant, forceful, and socially ascendant

E4: Activity

Tendency to exhibit high energy, rapid movements with a need to keep busy

E5: Excitement Seeking

Need for and enjoyment of highly stimulating activities

E6: Positive Emotions

Tendency to experience positive emotions

Openness to Experience

Measure encompassing multiple traits such as curiosity, happiness, imagination, and abstract thinking

O1: Fantasy

Measure of one’s imagination and fantasy life

O2: Aesthetics

Interest in art and beauty

O3: Feelings

Openness to and awareness of one’s emotional life

O4: Actions

Behavioral aspects of openness (e.g., exploring novel places, foods, and activities)

O5: Ideas

Aspects of openness related to intellectual curiosity

O6: Values

Readiness to reexamine social, political, and religious values


Attitudes of trustworthiness and goodness of others; behaviors related to the respect of and empathy toward others

A1: Trust

Disposed to believe that others are honest and well intentioned

A2: Straightforwardness

Measure of one’s frankness, sincerity, and ingenuity

A3: Altruism

Genuine concern for the well-being of others

A4: Compliance

Way one reacts to interpersonal conflicts

A5: Modesty

Measure of humbleness and self-efficacy; does not necessarily mean one is lacking in self-confidence or self-esteem

A6: Tender-mindedness

Sympathy and concern for others


Orientation toward planning, organizing, and carrying out tasks; someone who is purposeful, determined, and strong willed

C1: Competence

Feeling that one is capable, sensible, prudent, and effective

C2: Order

Preference for orderliness and neatness

C3: Dutifulness

Adherence to one’s ethical principles and fulfillment of one’s moral obligations

C4: Achievement Striving

Aspiring and striving to succeed at one’s goals

C5: Self-Discipline

Ability to begin tasks and follow through with their completion

C6: Deliberation

Tendency to think carefully before acting

Source: (McCrae & Costa, 2010)


Yossef Ben-Porath is a paid consultant to the MMPI Publisher, the University of Minnesota, and Distributor, Pearson. As co-author of the MMPI-2-RF he receives royalties on test sales.


Alamilla, S. G., & Wojcik, J. V. (2013). Assessing for personality disorders in the Hispanic client. In L. T. Benuto (Ed.), Guide to psychological assessment with Hispanics (pp. 215–241). New York: Springer.Find this resource:

Allen, M. J., & Yen, W. M. (2002). Introduction to measurement theory. Prospect Heights, IL: Waveland Press.Find this resource:

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Find this resource:

Anderson, J., Snider, S., Sellbom, M., Krueger, R., & Hopwood, C. (2014). A comparison of the DSM-5 Section II and Section III personality disorder structures. Psychiatry Research, 216(3), 363–372.Find this resource:

Anestis, J. C., Gottfried, E. D., & Joiner, T. E. (2015). The utility of MMPI-2-RF substantive scales in prediction of negative treatment outcomes in a community mental health center. Assessment, 22(1), 23–35.Find this resource:

Archer, R., & Wheeler, E. (2013). Forensic uses of clinical assessment instruments (2d ed.). New York: Routledge.Find this resource:

Beck, S. J. (1930). Personality diagnosis by the means of the Rorschach test. American Journal of Orthopsychiatry, 1, 81–88.Find this resource:

Ben-Porath, Y. S. (2012). Interpreting the MMPI-2-RF. Minneapolis: University of Minnesota Press.Find this resource:

Ben-Porath, Y. S., & Butcher, J. N. (1991). The historical development of personality assessment. In C. E. Walker (Ed.), Clinical psychology: Historical and research foundations (pp. 121–156). New York: Plenum Press.Find this resource:

Ben-Porath, Y. S., & Tellegen, A. (2008/2011). MMPI-2-RF: Manual for administration, scoring and interpretation. Minneapolis: University of Minnesota Press.Find this resource:

Ben-Porath, Y. S., & Waller, N. G. (1992). “Normal” personality inventories in clinical assessment: General requirements and the potential for using the NEO Personality Inventory. Psychological Assessment, 4(1), 14–19.Find this resource:

Block, A. R., Marek, R. J., Ben-Porath, Y. S., & Kukal, D. (2017). Associations between preimplant psychosocial factors and spinal cord stimulation outcome: Evaluation using the MMPI-2-RF. Assessment, 24, 60–70.Find this resource:

Butcher, J. N. (2009). Oxford handbook of personality assessment. New York: Oxford University Press.Find this resource:

Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A. M., & Kaemmer, B. (1989). Minnesota Multiphasic Personality Inventory-2 (MMPI-2): Manual for administration and scoring. Minneapolis: University of Minnesota Press.Find this resource:

Cheung, F. M., Leung, K., Fan, R. M., Song, W. Z., Zhang, J. P., Fung, H. H., & Ng, S. (2006). Cross-Cultural Personality Assessment Inventory. Psychology and Aging, 21(4), 810–814.Find this resource:

Cohen, R. J., Swerdlik, M. E., & Sturman, E. (2013). Psychological testing and assessment: An introduction to tests and measurement. New York: McGraw Hill.Find this resource:

Costa, P. T., Jr., & McCrea, R. R. (1985). The NEO Personality Inventory: Manual Form S and Form R. Lutz, FL: Psychological Assessment Resources.Find this resource:

Costa, P. T., Jr., & McCrea, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assessment Resources.Find this resource:

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.Find this resource:

DuBois, P. H. (1970). A history of psychological testing. Boston: Allyn and Bacon.Find this resource:

Egloff, B., Schwerdtfeger, A., & Schmukle, S. C. (2005). Temporal stability of the Implicit Association Test—anxiety. Journal of Personality Assessment, 84(1), 82–88.Find this resource:

Exner, J. E. (1974). The Rorschach: A comprehensive system. Volume 1: Basic foundations. New York: Wiley.Find this resource:

Exner, J. E. (2003). The Rorschach: A comprehensive system. Volume 1: Basic foundations (4th ed.). Hoboken, NJ: Wiley.Find this resource:

Eysenck, H. J., Hendrickson, A., & Eysenck, S. G. (2013). Personality structure and measurement (psychology revivals). London: Routledge.Find this resource:

Fernandez, K., Boccaccini, M., & Noland, R. (2008). Detecting over- and underreporting of psychopathology with the Spanish-language Personality Assessment Inventory: Findings from a simulation study with bilingual speakers. Psychological Assessment, 20(2), 189–194.Find this resource:

Frank, J. D. (1974). Psychotherapy: The restoration of morale. American Journal of Psychiatry, 131, 271–274.Find this resource:

De Fruyt, F., McCrae, R., Szirmak, A., & Nagy, J. (2004). The Five-Factor Personality Inventory as a measure of the five-factor model—Belgian, American, and Hungarian comparisons with the NEO-PI-R. Assessment, 11(3), 207–215.Find this resource:

Garb, H. N., Lilienfeld, S. O., Wood, J. M., & Nezworski, M. T. (2002). Effective use of projective techniques in clinical practice: Let the data help with selection and interpretation. Professional Psychology, Research and Practice, 5, 454.Find this resource:

Glassmire, D., Tarescavage, A., Martinez, J., Gomez, A., & Burchett, D. (2016). Clinical utility of the MMPI-2-RF SUI items and scale in a forensic inpatient setting: Association with interview self-report and future suicidal behaviors. Psychological Assessment, 28(11), 1502–1509.Find this resource:

Gregory, R. J. (2004). Psychological testing: History, principles, and applications. Boston: Pearson/A and B.Find this resource:

Groth-Marnat, G., & Wright, A. J. (2016). Handbook of psychological assessment. Hoboken, NJ: John Wiley & Sons.Find this resource:

Harkness, A. R., & McNulty, J. L. (2006). An overview of personality: The MMPI-2 Personality Psychopathology Five (PSY-5) Scales. In J. N. Butcher & J. N. Butcher (Eds.), MMPI-2: A practitioner’s guide (pp. 73–97). Washington, DC: American Psychological Association.Find this resource:

Hartmann, E., Nørbech, P. B., & Grønnerød, C. (2006). Psychopathic and nonpsychopathic violent offenders on the Rorschach: Discriminative features and comparisons with schizophrenic inpatient and university student samples. Journal of Personality Assessment, 86(3), 291–305.Find this resource:

Hathaway, S. R. (1965). Personality inventories. In B. B. Wolman (Ed.), Handbook of clinical psychology (pp. 451–476). New York: McGraw Hill.Find this resource:

Hathaway, S. R., & McKinley, J. C. (1940). A multiphasic personality schedule (Minnesota): I. Construction of the schedule. Journal of Psychology, 10, 249–254.Find this resource:

Heymans, G., & Wiersma, E. (1906). Beitrage zur spezillen psychologic auf grund einer massenunterschung. Zeitschrift Fur Psychologie, 43, 81–127.Find this resource:

Klopfer, B., & Kelley, D. M. (1946). The Rorschach technique: A manual for a projective method of personality diagnosis. New York: Collins.Find this resource:

Kotov, R., Krueger, R., Watson, D., Achenbach, T., Althoff, R., Bagby, R., … Zimmerman, M. (2017). The Hierarchical Taxonomy of Psychopathology (HiTOP): A dimensional alternative to traditional nosologies. Journal of Abnormal Psychology, 126(4), 454–477.Find this resource:

Marek, R. J., Ben-Porath, Y. S., Van Dulmen, M. M., Ashton, K., & Heinberg, L. J. (2017). Using the presurgical psychological evaluation to predict 5-year weight loss outcomes in bariatric surgery patients. Surgery for Obesity and Related Diseases, 3, 514.Find this resource:

Marek, R. J., Ben-Porath, Y. S., Windover, A., Tarescavage, A. M., Merrell, J., Ashton, K., … Heinberg, L. J. (2013). Assessing psychosocial functioning of bariatric surgery candidates with the Minnesota Multiphasic Personality Inventory-2 Restructured Form (MMPI-2-RF). Obesity Surgery, 11, 1864.Find this resource:

Marek, R. J., Block, A. R., & Ben-Porath, Y. S. (2015). The Minnesota Multiphasic Personality Inventory–2–Restructured Form (MMPI-2-RF): Incremental validity in predicting early postoperative outcomes in spine surgery candidates. Psychological Assessment, 27(1), 114–124.Find this resource:

Martinez, U., Fernandez del Rio, E., Lopez-Duran, A., & Becona, E. (2017). The utility of the MMPI-2-RF to predict the outcome of a smoking cessation treatment. Personality and Individual Differences, 106, 172–177.Find this resource:

McCrae, R. R., & Costa, P. T., Jr. (2010). NEO Inventories: Professional manual. Lutz, FL: Psychological Assessment Resources.Find this resource:

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. Psychological Assessment, 7, 741–749.Find this resource:

Meyer, G. J., & Eblin, J. J. (2012). An overview of the Rorschach Performance Assessment System (R-PAS). Psychological Injury and Law, 5(2), 107–121.Find this resource:

Meyer, G. J., Viglione, D. J., Mihura, J. L., Erard, R. E., & Erdberg, P. (2011). Rorschach Performance Assessment System: Administration, coding, interpretation, and technical manual. Toledo, OH: Rorschach Performance Assessment System.Find this resource:

Miller, G. F. (2007). Sexual selection for moral virtues. The Quarterly Review of Biology, 82(2), 97–125.Find this resource:

Morey, L. C. (1991). Personality Assessment Inventory: Professional manual. Odessa, FL: Psychological Assessment Resources.Find this resource:

Morey, L. C. (2007). Personality Assessment Inventory: Professional manual (2d ed.). Odessa, FL: Psychological Assessment Resources.Find this resource:

Morgan, C. D., & Murray, H. A. (1935). A method for investigating phantasies: The Thematic Apperception Test. Archives of Neurology and Psychiatry, 34, 289–306.Find this resource:

Murray, H. A. (1943). Thematic Apperception Test—manual. Cambridge, MA: Harvard University Press.Find this resource:

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill.Find this resource:

Phillips, T., Sellbom, M., Ben-Porath, Y. S., & Patrick, C. (2014). Further development and construct validation of MMPI-2-RF indices of global psychopathy, fearless-dominance, and impulsive-antisociality in a sample of incarcerated women. Law and Human Behavior, 38(1), 34–46.Find this resource:

Roberts, M. D., Thompson, J. A., & Johnson, M. (2000). PAI law enforcement, corrections, and public safety selection report module. Odessa, FL: Psychological Assessment Resources.Find this resource:

Rorschach, H. (1942). Psychodiagnostics. New York: Grune & Stratton. (Original work published in 1921.)Find this resource:

Sellbom, M., Drislane, L. E., Johnson, A. K., Goodwin, B. E., Phillips, T. R., & Patrick, C. J. (2016). Development and validation of MMPI-2-RF scales for indexing triarchic psychopathy constructs. Assessment, 23(5), 527–543.Find this resource:

Shkalim, E. (2015). Psychometric evaluation of the MMPI-2/MMPI-2-RF restructured clinical scales in an Israeli sample. Assessment, 22(5), 607–618.Find this resource:

Standards for educational and psychological testing. (2014). Washington, DC: American Educational Research Association.Find this resource:

Tarescavage, A. M., Corey, D. M., Gupton, H. M., & Ben-Porath, Y. S. (2015). Criterion validity and practical utility of the Minnesota Multiphasic Personality Inventory–2–Restructured Form (MMPI–2–RF) in assessments of police officer candidates. Journal of Personality Assessment, 97(4), 382–394.Find this resource:

Tellegen, A., Ben-Porath, Y. S., McNulty, J., Arbisi, P., Graham, J. R., & Kaemmer, B. (2003). The MMPI-2 Restructured Clinical (RC) scales: Development, validation, and interpretation. Minneapolis: University of Minnesota Press.Find this resource:

Thompson, B. (2003). Score reliability: Contemporary thinking on reliability issues. Thousand Oaks, CA: SAGE.Find this resource:

Wall, T., Wygant, D., & Gallagher, R. (2015). Identifying overreporting in a correctional setting: Utility of the MMPI-2 Restructured Form validity scales. Criminal Justice and Behavior, 42(6), 610–622.Find this resource:

Waller, N. G., & Ben-Porath, Y. S. (1987). Is it time for clinical psychology to embrace the five-factor model of personality? American Psychologist, 42, 887–889.Find this resource:

Weiner, I. B. (2004). Rorschach Inkblot Method. In M. E. Maruish (Ed.), The use of psychological testing for treatment planning and outcomes assessment: Instruments for adults (Vol. 3, 3d ed., pp. 553–587). Mahwah, NJ: Lawrence Erlbaum.Find this resource:

Weiner, I. B., & Greene, R. L. (2008). Handbook of personality assessment. Hoboken, NJ: John Wiley & Sons.Find this resource:

Westen, D., Lohr, N. E., Silk, K., Kerber, K., & Goodrich, S. (1989). Object relations and social cognition TAT scoring manual (4th ed.). Ann Arbor: University of Michigan Press.Find this resource:

Wood, J. M., Nezworski, M. T., Lilienfeld, S. O., & Garb, H. N. (2003). What’s wrong with the Rorschach?: Science confronts the controversial inkblot test. San Francisco: Jossey-Bass.Find this resource:

Woodworth, R. S. (1919). Examination of emotional fitness for war. Psychological Bulletin, 15, 59–60.Find this resource:

Yoon, K., Schmidt, F., & Ilies, R. (2002). Cross-cultural construct validity of the five-factor model of personality among Korean employees. Journal of Cross-Cultural Psychology, 33(3), 217–235.Find this resource: