Ulrich Schimmack
Shigehiro Oishi
resume
Personality ratings of 25 Big Five items by two national samples (US, Japan) were analyzed using an item-level measurement model that separates method factors (tolerance, halo bias) and factors of traits. The results show a strong influence of the halo bias in American responses that skews cultural comparisons in personality. After correcting for Halo bias, the Japanese were more conscientious, more outgoing, more open to experience, and less neurotic and agreeable. The findings support cultural differences in positive delusions and raise questions about the validity of scale-based studies examining cultural differences in personality.
Introduction
Cultural stereotypes involve cross-cultural differences in personality traits. However, cross-cultural personality studies do not support the validity of these cultural stereotypes (Terracciano et al., 2005). If two measurements give different results, it is necessary to investigate the causes of these discrepancies. One obvious reason could be that cultural stereotypes are simply wrong. It is also possible that cross-cultural academic personality studies may yield misleading results (Perugini & Richetin, 2007). One problem for empirical studies of cross-cultural personality differences is that cultural differences are often small. Culture explains at most 10% of the variance, and often the percentages are much smaller. For example,McCrae et al. (2010) found that culture explains only 1.5% of the differences in likability ratings. Because this variance is method variance, the variance due to actual differences in agreement is likely to be less than 1%. With small amounts of valid variance, method factors can have a strong impact on the pattern of mean differences between cultures.
A methodological problem in cross-cultural personality studies is that personality measures are developed with a focus on the correlation of elements within a population. Item averages are not relevant, except that items must avoid floor or ceiling effects. However, cross-cultural comparisons depend on differences in mean item values. Since the mean values of the items were not subjected to any psychometric evaluation, it is possible that the mean values of the items do not have construct validity. Let's take the example of "work hard." How hard people work can be influenced by culture. For example, people in poor cultures have to work more to earn a living. The "work hard" element may correctly reflect the differences in consciousness within poor and rich cultures, but the differences between cultures would reflect environmental conditions rather than consciousness. Consequently, it is necessary to show that cultural differences in item means are valid measures of cultural differences in personality.
Unfortunately, it is difficult to obtain data from a large sample of countries, and sample sizes are often quite small. McCrae et al. (2010) examined the convergent validity of the Big Five scores with 18 nations. The only significant evidence of convergent validity was obtained for neuroticism,r= 0.44 and extroversion,r= 0.45. Openness and kindness even produced small negative correlations,r= -.27,r= -0.05. The largest cross-cultural personality studies had 36 overlapping nations (Allik et al., 2017;Schmitt and others, 2007🇧🇷 The highest convergent validity wasr= 0.4 for extraversion and conscientiousness. low convergent validity,r= 0.2 for neuroticism and agreeableness, and the convergent validity for openness was 0 (Schimmack, 2020). These results demonstrate the difficulty of measuring personality across cultures and the lack of validated measures of cross-cultural personality profiles.
Factors of the method in the measurement of personality.
Personality self-assessment is known to be influenced by method factors. One factor is a stylistic factor in the use of response formats known as acquiescence bias (Cronbach, 1942, 1965). The other factor reflects individual differences in response to the evaluative importance of items, known as halo bias (Thorndike, 1920). Both methodological factors can distort cross-cultural comparisons. For example, national stereotypes suggest that the Japanese are more conscientious than Americans, but mean conscientiousness scores in cross-cultural studies do not support this stereotype (Oishi & Roth, 2009). Both factors of the method may artificially lower the mean score for Japan because Japanese respondents are less likely to use extreme scores (Min, Cortina and Miller, 2016) and Asians are less likely to increase their scores on desirable traits (Kim, Schimmack, & Oishi, 2012). In this article, we use structural equation modeling to separate method variance from trait variance to distinguish cultural differences in response tendencies from cultural differences in personality traits.
Convenience samples compared to national samples
Another problem for empirical studies of national differences is that psychologists often resort to convenience sampling. The problem with convenience samples is that personality can change with age and there are regional differences in personality within nations (). For example, a sample of students from New York University may be drastically different from a sample of students from Mississippi State University or Iowa State University. While regional differences tend to be small, national differences are as well. are. Therefore, small regional differences can distort national comparisons. To avoid these biases, it is preferable to compare national samples that cover all regions of a nation and a wide age range.
modeling approach
The purpose of our study is to advance research on cultural differences in personality by comparing a Japanese and an American sample who completed the same Big Five personality questionnaire using a measurement model that distinguishes personality factors from method factors. . The measurement model is an improved version of the halo alpha beta model of Anusic et al. (2009) (Schmack, 2019🇧🇷 The model is essentially a three-factor model.

That is, each item loads three factors, namely (a) a main loading on one of the five main factors, (b) a loading on an acceptance bias factor, and (c) a loading on the bias/halo factor. evaluative. . Because the Big Five metrics typically do not have a simple structure, the model can also include secondary loadings on other Big Five factors. This measurement model has been successfully adapted to various Big Five questionnaires (Schimmack, 2019). This is the first time that the model has been applied to a multigroup model to compare measurement models for American and Japanese samples.
First, we fit a very restrictive model that assumes invariance between the two factors. Given the lack of cross-cultural psychometric comparisons, we expected that this model would not show an acceptable fit. We then modified the model to account for cultural differences in some primary factor loadings, secondary factor loadings, and item intersections. This step makes our work exploratory. However, we believe that this exploratory work is necessary as a first step towards a psychometrically sound measurement of cultural differences.
participant
Participants (N=952 Japanese, 891 American) were recruited by Nikkei Research Inc. and its US affiliate using a national probability-sampling method based on sex and age. The average age was 44 years. The data were previously used to compare the influence of personality on life satisfaction ratings, but without comparing mean personality and life satisfaction scores (Kim, Schimmack, Oishi y Tsutsui, 2018).
measurements
The Big Five articles come from the International Group of Personality Articles (Goldberg and others, 2006🇧🇷 There were five items for each of the five main dimensions (Table 1).
Results
We first fit a model with no intermediate structure to the data. A model with strict invariance for the two samples did not have an acceptable fit using RMSEA < 0.06 and CFI > 0.95 as criterion values, RMSEA = 0.064, CFI = 0.834. However, CFI values in models with single-element indicators are not expected to reach 0.95 (Anusic et al., 2009). So the focus is on the RMSEA. We first examine the modification indices (MI) of the primary charges. We use MI > 30 as parameter release criteria to avoid overfitting the model. We found seven primary loadings that would significantly improve the fit of the model (n4, e3, a1, a2, a3, a4, c4). Releasing these parameters improved the model (RMSEA = 0.060, CFI = 0.857). Next, we examine the loadings on the halo factor, as some elements are likely to differ in connotative meaning between languages. However, we only find two noteworthy MI (o1, c4). Releasing these parameters improved the fit of the model (RMSEA = 0.057, CFI = 0.871). We have identified six secondary stresses that differ significantly from culture to culture. One was a secondary distress in neuroticism (e4), and four were a secondary distress in agreeableness (n5, e1, e3, o4), and one was a secondary distress in responsibility (n3). Releasing these parameters improved the fit of the model (RMSEA = 0.052, CFI = 0.894). We were happy with this measurement model and continued with the average model. The first model corrected the intercepts of the items and the factorial means to be identical. This model had a worse fit than the model without average structure (RMSEA = 0.070, CFI = 0.803). The highest MI was observed for the medium halo factor. Consideration of the mean differences in the fit of the halo model improved it significantly (RMSEA=0.060, CFI=0.849). Next, IM suggested allowing mean differences in extroversion and agreeableness. We then accounted for the mean differences in the other factors. This further improved the fit of the model (RMSEA = 0.058, CFI = 0.864), but not as much. The MIs suggested seven items with different item sections (n1, n5, e3, o3, a5, c3, c5). Relaxing these parameters improved the fit of the model near the level of the model without mean structure (RMSEA = 0.053, CFI = 0.888).
Table 1 shows the primary charges and the halofactor charges for the 25 elements.

The results show very similar primary loads for most of the items. This means that the factors in both samples are of similar importance and a comparison of the two cultures is possible. However, there are some differences that may affect comparisons based on item sum values. The item “feeling comfortable with people” weighs much more heavily on extroversion in the US than in Japan. The friendship items “offend people” and “sympathize with others' feelings” are also more important in the US than in Japan. After all, "screwing up" is a matter of conscience in the United States, but not in Japan. The fact that item loadings are more in line with the theoretical framework is due to item development in the US.
An important new finding is that most halo factor loadings are also very similar across countries. For example, the item “have great ideas” has a high load for the United States and Japan. This result contradicts the notion that appraisal biases are culture-specific (Church and others, 2014🇧🇷 The only notable difference is the "make a mess" point, which doesn't add significantly to the halo factor in Japan. Even in English, the meaning of this item is ambiguous and future studies should replace this item with a better item. The correlation between the halo charges for the two samples is high,r= 0,96.
Table 2 shows the mean values of the items and the intersection points of the items in the model.

The US sample mean values are strongly correlated with halo factor loadings,r= 0.81. This is a strong finding in western samples. The most desirable items are most recommended. The reason could be that individuals actually act in the desired way most of the time and that halo bias affects item means. Surprisingly, there is no notable correlation between item means and halo factor loadings for the Japanese sample.r= 0.08. This pattern of results suggests that the American averages are much more affected by the halo bias than the Japanese averages. Additional evidence is provided by examining mean differences. For coveted items (low N, high E, O, A, and C), the US averages are always higher than the Japanese averages. For unwanted items, the US averages are always lower than the Japanese averages, with the exception of the "Stay in the background" item, where the averages are the same. Difference values are also positively correlated with halo charges, r = 0.90. In conclusion, there is strong evidence that the halo bias skews the comparison of personality in these two samples.
Intercepted articles show cultural differences in articles after accounting for cultural differences in Halo and other factors. Notable differences were found in some items. Even after controlling for halo and extroversion, American respondents reported being more comfortable with people than Japanese respondents. This difference conforms to cultural stereotypes. After correcting for halo bias, the Japanese now perform better than the Americans at completing tasks immediately. This also fits with cultural stereotypes. However, Americans still report paying more attention to detail than the Japanese, which contradicts cultural stereotypes. Extensive validation research is needed to examine whether these results reflect actual cultural differences in personality and behavior.
Figure 2 shows the mean differences in the Big Five factors and the two bias factors.

Figure 2 shows a very large difference in halo bias. The difference is so great that it seems implausible. Possibly the model corrects too much, which would skew the mean differences of the actual features in the opposite direction. There is little evidence of cultural differences in compliance bias. An open question is whether the strong halo effect is due solely to valuation bias. It is also possible that a modesty bias is at play, since modesty implies less extreme responses to desirable things and less extreme responses to undesirable things. To separate the two, it would be necessary to include common and rare non-judgmental behaviors.
The most interesting result for the big five factors is that, after removing the halo bias, the Japanese sample performs better than the US sample in conscientiousness. This reverses the mean differences in this sample and in previous studies showing greater conscientiousness in US samples than in Japanese samples (). The present results suggest that the halo bias masks the real difference in conscientiousness. However, other results are more surprising. In particular, the present results suggest that the Japanese are more extroverted than the Americans. This contradicts cultural stereotypes and previous studies. The problem is that cultural stereotypes can be wrong, and that previous studies have not controlled for halo bias. More research using real-world behaviors and fewer assessment items is needed to draw meaningful conclusions about cross-cultural personality differences.
discussion
It has been known for 100 years that self-assessment of personality is influenced by connotative meaning. At least in North America, it's common to see a strong correlation between the desirability of an item and the means of self-assessment. There is also consistent evidence that Americans rate themselves as more desirable than the average American (). However, that does not mean that Americans think they are better than everyone else. In fact, self-evaluations tend to be slightly less favorable than evaluations from friends or family members (), indicating a general evaluation bias toward evaluating oneself and giving others a favorable rating.
Given the prevalence of appraisal bias in personality ratings, it is surprising that halo bias has received so little attention in cross-cultural personality studies. One reason may be the lack of a good way to measure and remove halo variance from personality assessments. Despite early attempts to identify socially desirable responses, lying scales have shown little validity as a measure of bias (Ref). The problem is that the apparent values on the lying scales contain valid personality variations as well as bias variations. So correcting for values on these scales literally throws the baby out with the bathwater (valid variance) (bias variance). Structural equation modeling (SEM) solves this problem by dividing observed variations into unobserved or latent variations. However, personality psychologists are reluctant to use SEM because item models require large samples and theoretical models are too simplistic and poor fit. Based on multi-rater studies that emerged in the 1990s, we developed a Big Five measurement model that separates personality variance from evaluative bias variance (Anusic, et al., 2009; Schimmack, Kim, & 2012; Schimmack, 2019). Here we apply this model to cross-cultural data for the first time to examine whether cultures differ in halo bias. The result suggests that the halo bias has a strong impact on personality assessments in the US, but not in Japan. Differences in halo bias bias comparisons of actual personality traits. While the raw scores suggest that the Japanese are less conscientious than the Americans, the factor-corrected averages suggest the opposite. The Japanese participants also appeared to be less neurotic, more outgoing, and more open to experience, which was a surprising finding. Correcting for halo bias did not change cultural differences in pleasure. Americans were more comfortable than Japanese with and without correction for Halo bias. Our results do not provide a conclusive answer to cultural differences in personality, but they shed new light on some questions in personality research.
Cultural differences in self-improvement.
An unresolved question in personality psychology is whether the positive bias in self-perception, also known as self-improvement, is unique to American or Western cultures or whether it is a universal phenomenon (Church et al., 2016). One problem is the different approaches to measuring self-improvement. The most widely used method is social comparisons, where individuals are compared to an average person. These studies tend to show above-average sustained effects across cultures (Ref). However, this finding does not mean that Halo biases are equally strong across cultures. Brown and Kobayashi (2002) found above-average effects in the United States and Japan, but the self-reports of the Japanese and their peers were less favorable than those of the United States. Kim and others. (2012) explain this pattern with a general positive norm in North America that affects both self-evaluation and peer evaluation. Our results are consistent with this view and suggest that self-improvement is not a universal trend. More cross-cultural research is needed to examine which cultural factors moderate halo distortions.
Classification biases or self-perception biases
An open question is whether halo biases are purely classification biases or reflect distorted self-perceptions. One model states that participants are aware of their true selves, but present themselves in a more positive light to others. Another model suggests that individuals actually believe that their personality is more desirable than it really is. It is not easy to distinguish empirically between these two models. zzz
Halo Bias and the Reference Group Effect
In an influential article, Heine et al. (2002) criticized cross-cultural comparisons in personality ratings as invalid. The main argument was that respondents adjust response categories to cultural norms. This adjustment has been called the reference group effect. For example, the item "insulting people" is not answered based on the frequency of the insults, nor the frequency of the insults compared to other behaviors. Rather, it is answered in comparison to the typical frequency of insults in a particular culture. The main prediction of the reference group effect is that, in all cultures, responses should cluster around the center of a Likert scale that represents the typical frequency of the offense. As a result, cultures can vary dramatically in actual crime frequency, while average scores on subjective rating scales are identical.
The present results are not compatible with a simple control group effect. In particular, the US sample showed notable variation in mean item scores in relation to item desirability. As a result, undesirable items such as “offend people” averaged much lower, M=1.83, than the scale mean (3), and desirable items “have excellent ideas” averaged lower (M=3.73). as the midpoint of the scale. This finding suggests that the halo bias, rather than the reference group effect, threatens the validity of cross-cultural comparisons.
Reference group effects may play a larger role in Japan. Here, mean item values were unrelated to item desirability and clustered closer to the midpoint of the scale. The highest average was 3.56 for concern and the lowest average was 2.45 for feeling comfortable with people. However, other evidence contradicts this hypothesis. After removing the effects of Halo and other personality factors, item intercepts in the two national samples were still highly correlated, r=0.91. This result is in conflict with culture-specific reference groups, which would not produce consistent item intercepts.
Our results also provide a new explanation for the low knowledge of the Japanese samples. A reference group effect would not predict significantly lower awareness. However, a stronger halo effect in the US explains this finding, as conscientiousness is generally scored on desirable items. Our results are also consistent with the finding that self-esteem and self-improvement are higher in the United States than in Japan (Heine & Buchtel, 2009). These aforementioned biases increase conscientiousness scores in the US. With this bias removed, the Japanese consider themselves more conscientious than Americans.
Limitations and Future Directions
We reiterate previous calls for validation of the personality scores of nations (Heine & Buchtel, 2009). Current results are inconsistent across questionnaires, and even low convergent validity may be inflated by cultural differences in response styles. Future studies should attempt to measure personality with items that minimize social desirability and use response formats that avoid the use of reference groups (eg, frequency estimates). Additionally, rating-based results need to be validated against objective behavioral indicators.
Future research should also take advantage of advances in psychological measurement and use models that can identify and control for response artifacts. The present model demonstrates the ability to separate appraisal bias, or halo variation, from actual personality variation. Future studies should use this model to compare a larger number of nations.
The main limitation of our study is the relatively small number of items. The larger the number of items, the easier it is to distinguish item-specific variation, method variation, and function variation. The measure also failed to take into account that the Big Five are higher-order factors with more fundamental properties called facets. Measures such as the BFI-2 or the NEO-PI3 were intended to examine cultural differences at the facet level, which often show unique cultural influences other than effects on the Big Five (Schimmack, 2020).
We conclude with a statement of scientific humility. The present results should not be taken as clear evidence of cultural differences in personality. Our article is just one small step toward the goal of measuring personality differences across cultures. One obstacle to revealing such differences is that national differences appear to be relatively small compared to personality differences within nations. One possible explanation for this is that personality variations are caused by biological rather than cultural factors. For example, twin studies suggest that 40% of the variation in personality traits is caused by genetic variation within a population, while cross-cultural studies suggest that at most 10% of the variation is caused by genetic influences. cultural in the average population. Therefore, although the discovery of cultural differences in personality is of great academic interest, the evidence of cultural differences between nations should not be used to stereotype individuals from different nations. Finally, it is important to distinguish between the personality traits captured by the Big Five Traits and other personality attributes such as attitudes, values, or goals that may be more influenced by culture. The main contribution of this article is to show that cultural differences in response styles exist and bias national personality comparisons with simple mean scales. Future studies should take response styles into account.
references
Cronbach, L.J. (1942). Studies of acquiescence as a factor in true-false tests.Journal of Educational Psychology, 33(6), 401–415.https://doi.org/10.1037/h0054677
Heine, S.J. & Buchtel, E.E. (2009). Personality: The universal and the culturally specific.annual journal of psychology, 60,369–394.https://doi.org/10.1146/annurev.psych.60.110707.163655
Perugini, M. & Richetin, J. (2007). In the land of the blind, the one-eyed man is king.European Personality Magazine, 21(8), 977–981.https://doi.org/10.1002/per.649
Schmack, U. (2020). Science of personality: the science of human diversity. Hat, 978-1-77412-253-2.https://tophat.com/marketplace/social-science/psychology/full-course/personality-science-the-science-of-human-diversity-ulrich-schimmack/4303/
Terracciano, A. et al. (2005). The national character does not reflect the petty personality.
Trait levels in 49 cultures.Sciences, 310, 96–100.