Achievement or Social Background? The Impact of Tracking on the Composition of Schools in an International Comparison.

Brinkmann, Maximilian ; Huth-Stöckle, Nora ; et al.

In: Zeitschrift für Soziologie, Jg. 53 (2024-06-01), Heft 2, S. 164-185

Online academicJournal

Zugriff:

Volltext (PDF)

This study explores the implications of early between-school tracking within educational systems – a practice that involves sorting students into different educational pathways based on their achievement levels. We examine two potential effects of this process: (i) the promotion of homogeneous learning environments through tracking, and (ii) the potential for tracking to exacerbate social segregation among schools. To scrutinize these effects, we analyze data from the assessment studies PISA, TIMSS, and PIRLS (1995–2019). Additionally, we investigate whether school selectivity influences the tracking effects. Using difference-in-differences models combined with multiverse analyses, our findings demonstrate that early between-school tracking indeed contributes to the homogeneity of learning environments and can lead to increased social school segregation. However, our results do not indicate a moderating role of school selectivity. [ABSTRACT FROM AUTHOR]

Zusammenfassung: In dieser Studie untersuchen wir die frühzeitige Aufteilung von Schülerinnen und Schüler mit unterschiedlichen Leistungsniveaus in Schulen mit verschiedenen Bildungsgängen (engl.: „Tracking"). Wir betrachten zwei mögliche Effekte der Gliederung: (i) die Entstehung homogener Lernumgebungen und (ii) die Verstärkung der sozialen Segregation zwischen Schulen. Um diese Effekte zu analysieren, verwenden wir die Daten aller PISA, TIMSS und PIRLS-Studien zwischen 1995 und 2019. Darüber hinaus untersuchen wir, ob die Selektivität der Schulen die gefundenen Effekte von Tracking beeinflusst. Die Ergebnisse unserer „Difference-in-Differences"-Modelle und der durchgeführten „Multiverse"-Analysen zeigen, dass frühzeitiges Tracking tatsächlich zur Homogenität von Lernumgebungen beiträgt und zu einer erhöhten sozialen Segregation führen kann. Allerdings deuten unsere Ergebnisse nicht auf eine moderierende Rolle der Selektivität hin. [ABSTRACT FROM AUTHOR]

Achievement or Social Background? The Impact of Tracking on the Composition of Schools in an International Comparison Leistung oder Herkunft? Auswirkungen von Tracking auf die Zusammensetzung von Schulen im internationalen Vergleich

Keywords: Early between-school Tracking; Difference-in-Differences; Multiverse Analysis; Segregation; Gliederung; Bildungssystem; Differences-in-Differences; Multiverse-Analyse

Article note Maximilian Brinkmann and Nora Huth-Stöckle share first authorship.

1 Introduction

Over the past decades, scholars have debated the merits of sorting students into secondary schools of different educational tracks based on students' scholastic achievement at the end of primary school, referred to as early between-school tracking (hereinafter "tracking" or "early-tracking"). Especially the question whether tracking increases overall achievement and enforces social inequalities in achievement led to a large body of empirical research ([2] 2011; [21] & Wößmann 2006; [41] et al. 2021; [45] & Mijs 2010; [43] & Triventi 2022). Overall, tracking appears to increase social inequalities without fostering educational achievement (Hanushek & Wößmann 2006; [28] 2021; Strello et al. 2021; Terrin & Triventi 2022; Van de Werfhorst & Mijs 2010), but there is no definite consensus (Betts 2011; [13] 2016; [14] & Seuring 2020).

Consequently, the effects of educational tracking on achievement and inequality of achievement continue to be a topic of intense debate ([15] & Seuring 2020 vs. [22] & Matthewes 2022 vs. [16] & Seuring 2023), which is mirrored by an equally controversial public debate in countries that employ this practice (e.g., Austria, Germany). Two potential effects are central to the debate around early between-school tracking and its impact. The first (intended) effect is the formation of homogenous learning environments (Esser & Seuring 2020; Hanushek & Wößmann 2006; Matthewes 2021). In tracked systems, students are sorted into different types of secondary schools based primarily on their performance at the end of primary school. The resulting homogeneous learning environments constitute a central rationale for the implementation of tracking because they are theorized to increase the efficiency of instruction (Matthewes 2021). The second (unintended) effect is the potential reinforcement of social stratification, as the sorting process after primary school can be closely related to students' socioeconomic status, giving rise to socially segregated secondary schools. Furthermore, the selectivity of the education system – the extent to which the sorting process is based on prior achievement – may influence both of these processes. In more selective systems, parental influence on the sorting process is limited, which could lead to more homogeneous learning environments and less social segregation in schools (Esser & Seuring 2020; Heisig & Matthewes 2022; [27] et al. 2023).

Considering the controversial nature of early between-school tracking, it is surprising that these ambivalent effects of tracking – homogenous learning environments and social segregation – remain largely untested (however, see Strello et al. 2021; [10] & Raabe 2023). While the vast majority of previous research has been concerned with the effects of tracking on levels or inequality in achievement, we take a step back and study a) whether there is empirical support for the effects in question and b) the extent to which they are influenced by the degree of selectivity.

To operationalize these questions, we pooled data from twenty large-scale assessment studies (i.e., PIRLS, TIMSS, PISA) conducted between 1995 to 2019. This allows us to estimate how similar students are in a given school with regard to their achievement and social status – both in primary and secondary school and across 63 unique countries and regions. We then constructed synthetic cohorts and estimated Difference-in-Differences (DiD) models. Thereby we exploit the fact that no country tracks students in primary school (Hanushek & Wößmann 2006) and observe tracked and non-/late-tracked education systems before and after a potential tracking policy was administered. Lastly, we apply multiverse analyses ([38] et al. 2020; [40] et al. 2016): instead of conducting a single analysis – and possibly some robustness checks – we systematically vary decisions regarding variable operationalization, sample composition, and model choice ([18] & Loken 2013).

This study makes two main contributions to the literature. First, we enhance our understanding of the effects of early between-school tracking by empirically evaluating two core mechanisms. Hence, we put central vantage points from which research on the effect of tracking is understood (i.e., efficiency via homogenous learning environments vs. stratification via social segregation) on the grounds of empirical evidence. Second, we motivate the use of multiverse analysis in educational research. Using multiverse analysis means that we systematically vary our data-analytic decisions, make them transparent, and show their impact on results ([50] & Holsteen 2017). This is of particular relevance in this field because combining different large-scale assessment surveys is a common practice and involves extensive harmonization steps and data analytical decisions. Thus, a systematic investigation of the impact of these decisions on results is of particular relevance.

Across 5,760 model specifications, we found consistent evidence that early between-school tracking increases the homogeneity of learning environments. Furthermore, our analyses across 3,840 model specifications suggest that tracking students at an early age can increase social segregation across schools. However, we do not find evidence that the selectivity of education systems moderates the effects of tracking on the homogeneity of learning environments or social segregation.

2 Mechanisms of Early Between-School Tracking

2.1 Early Between-School Tracking

There are various but often similar ways of conceptualizing tracking. Most of the definitions identify tracking as some sort of ability grouping: "the practice of assigning students to instructional groups on the basis of ability" ([20] 1994: 79; see also Betts 2011: 342; Esser 2016: 334; [29] & [37] 2007: 2; [26] et al. 2003: 44). Consequently, tracking is found in almost every education system in the world, in one way or another. Crucial, however, is the considerable variety across countries when it comes to a) the level (or extent) and b) the age at which tracking is administered. For instance, many countries have implemented streaming, which refers to ability grouping within one type of school for certain topics, subjects or classes (LeTendre et al. 2003: 44). It is also common that countries group students in vocational or academic tracks when they approach maturity (e.g., after Grade 9 or 10).

However, the main part of the controversy that surrounds tracking concerns the existence of early between-school tracking, where a high level of tracking and a low age of tracking culminate (Hanushek & Wößmann 2006; Strello et al. 2021). In prototype-like early between-school tracking countries like Germany, Austria or the Netherlands students are actively sorted into different types of secondary schools after four or six years of primary school. Typically, these secondary schools prepare for either vocational or academic training, though intermediate or mixed-types also exist. Early between-school tracking therefore has substantive consequences for students' educational careers, since different school types (tracks) imply different curricula and educational credentials.

Our definition of tracking therefore refers to early between-school tracking, which implies an active sorting of students after primary school. We define early tracking countries as those countries in which students are already tracked in different types of schools in grade 8. Because the concept of "early" is ambiguous, we will also consider countries that track in grade 9 as an alternative operationalization of early tracking.

2.2 Effects of Early Between-School Tracking

A large body of empirical literature has investigated the effects of early between school tracking on students' achievement and (in-)equalities in achievement ([6] & Checchi 2007; Esser & Seuring 2020; Hanushek & Wößmann 2006; Heisig & Matthewes 2022; Matthewes 2021; [33] 2008; Schütz et al. 2008; Strello et al. 2021; [44] 2019). Though not all results point in the same direction, most of the evidence suggests that tracking is associated with higher levels of inequality in achievement while there is little evidence suggesting that tracking increases overall achievement (Terrin & Triventi 2022). Hence, the majority of results cannot support the idea of an "equality-efficiency tradeoff", which would indicate both higher levels of achievement and inequalities in achievement as discussed by some authors (Hanushek & Wößmann 2006; Matthewes 2021). This reading of the literature is supported by a recent meta-analysis (Terrin & Triventi 2022).

The empirical literature suggests that educational tracking does not have positive effects on achievement but increases inequality (Terrin & Triventi 2022). However, a few but vocal scholars remain skeptical, most prominently Hartmut Esser. This is also evidenced in a recent controversy, in which the results of a study concerning tracking and selectivity (Esser & Seuring 2020) were challenged by other researchers (Heisig & Matthewes 2022; see Esser & Seuring 2023 for a reply). In this context, we argue that the debate would benefit from a more comprehensive understanding of tracking effects. For example, a naive explanation for the observed empirical patterns (i.e., Terrin & Triventi 2020) is that tracked systems lead to increased social segregation without promoting homogeneity in achievement, which may account for the observed patterns. A more nuanced explanation could involve the interplay of homogenous learning environments, social segregation, and resource stratification (unequal allocation of resources across tracks), resulting in a zero-sum game for achievement but heightened inequalities (Betts 2011; Terrin & Triventi 2022). Although our study does not aim to delve into these explanatory models, it is crucial to highlight that both explanations are plausible given the limited empirical evidence available. Does tracking lead to more homogenous learning environments, does it amplify social segregation? In the following sections, we demonstrate that both perspectives can be reasonably argued and are not mutually exclusive.

A common rationale for the implementation of tracking is efficiency (Brunello & Checchi 2007; Hanushek & Wößmann 2006; Matthewes 2021). This notion emphasizes that sorting is based on ability, or at least on ability proxies such as scholastic achievement at the end of primary school. As a result, students in a given track tend to be more similar in terms of their achievement, creating homogeneous learning environments. Such environments are posited as prerequisites for the higher efficiency of tracked systems, as they allow teachers and schools to tailor their entire mode of instruction to learning groups characterized by only minor variances in achievement. Consequently, learning could become more efficient for all students within these systems, and the early sorting of students ensures that these positive effects are sustained over a prolonged period (Brunello & Checchi 2007).

In contrast, critics of early between-school tracking contend that the sorting process primarily perpetuates the existing societal stratification. Differences in track placement may often reflect disparities in school preparedness, in familiarity with the school environment, and in parents' ability to intervene on behalf of their children in an educational setting ([3] & Van Houtte 2013; [9] et al. 2019; [31] & Grunder 2010). Thus, critics fear that sorting students is predominantly based on social background rather than actual ability. Students from advantaged backgrounds will disproportionately attend higher tracks, which offer prestigious credentials. Disadvantaged students, on the other hand, will be much more likely to attend lower tracks with poorer learning conditions, partly as a result of negatively selected learning environments in which disadvantaged students have fewer opportunities to learn from high achieving students (Matthewes 2021). As a consequence, disadvantaged students can experience stigmatization and obtain less prestigious credentials ([17] 1986; Hallinan 1994; Pfeffer 2008; Van de Werfhorst & Mijs 2010).

Although these different perspectives on tracking may strongly disagree on its merits, both are viable. Tracking could lead to homogenous learning environments and socially segregated schools. This phenomenon can be understood within Boudon's theoretical framework (1974), which explains differences in educational achievement through primary and secondary effects. Primary effects suggest an association between achievement and a student's social status, indicating that students from a higher social status, on average, achieve higher levels of academic success at the end of primary school. Hence, if the sorting process is primarily based on observed achievement, track placement may still align with a student's social status, resulting in homogenous learning environments and social segregation. Consequently, our study aims to investigate two hypotheses.

H1: Compared to non-/late-tracking systems, students in early between-school tracking systems show a higher similarity in secondary schools than in primary schools with respect to their achievement.

H2: Compared to non-/late-tracking systems, students in early between-school tracking systems show a higher similarity in secondary schools than in primary schools with respect to their social status.

2.3 The Moderating Role of School Selectivity

While tracked systems aim to select students based on their ability (i.e., via observed scholastic achievement), many education systems do not strictly select on achievement alone but allow parents and teachers to intervene in the selection process. This enables so-called secondary effects to play out ([4] 1974). The concept of secondary effects suggests that families of different social statuses make different decisions regarding the educational paths of children with similar levels of achievement (Boudon 1974; [5] & Goldthorpe 1997; [11] & Jonsson 1996), an idea that has found vast empirical support across various countries (Boone & Van Houtte 2013; Dumont et al. 2019; [12] & Rudolphi 2010). However, these secondary effects can manifest differently depending on how the sorting process at the end of primary school is organized.

For example, in German-speaking countries (Austria, Germany, Switzerland) and the Flemish part of Belgium, students' achievement at the end of primary school is the basis for track recommendations, but the selection process may also involve school or parental influence (Boone & Van Houtte 2013; Dumont et al. 2019; Neuenschwander & Grunder 2010). However, sorting students into different tracks may be more rigid in other countries. In Singapore, students must take a nationwide standardized test that essentially determines their track choice (Singapore Examinations and Assessment Board 2023; [47] et al. 2020). In Luxembourg, a school council decides on track recommendations that are typically binding ([1] & Hadjar 2017; [25] et al. 2015). The Netherlands has a different approach, combining a nationwide standardized test with primary school teachers' recommendations and family preferences ([35] et al. 2018).

Given this heterogeneity, tracked systems vary in their degree of selectivity, which refers to the extent to which the sorting process is based on achievement. Within the framework of secondary effects (differential decision-making based on social status), the selectivity of a system can shape the extent to which social status overrides the observed achievement. In less selective systems, families with higher social status may "game the system" (Dumont et al. 2019), enabling their children to move to higher tracks despite insufficient levels of achievement. By contrast, higher selectivity may prevent such behavior, as students are primarily sorted based on their achievement. Consequently, higher selectivity leads to more homogenous learning environments and less social segregation in schools. However, it is important to note that this does not imply the absence of social segregation, since primary effects (the association between achievement and social status) still cause differences in achievement at the end of primary school based on social status. Previous research (focusing on single countries) has indicated either heterogeneous (e.g., Esser & Hoenig 2018; [23] & Helbig 2015; Lorenz et al. 2023) or inconclusive effects of selectivity (Esser & Seuring 2020 vs.; Heisig & Matthewes 2022 vs. Esser & Seuring 2023).

Given this background, we examine selectivity from a cross-country perspective and inquire whether selectivity moderates the effects of tracking on the homogeneity of learning environments and the degree of social segregation:

H3a: In early between-school tracking systems, the greater the emphasis on achievement (rather than SES) in the selection process, the more similar students are within secondary schools compared to primary schools with respect to their achievement.

H3b: In early between-school tracking systems, the greater the emphasis on achievement (rather than SES) in the selection process, the less similar students are within secondary schools compared to primary schools with respect to their social status.

3 Methods and Data

Rather than estimating a single set of models to examine the effect of early between-school tracking, we conduct multiverse analyses (Steegen et al. 2016). Multiverse analyses help to investigate the robustness of the results to various data analytic decisions made during the research process. Researchers face numerous and sometimes arbitrary data analytic decisions, creating a "garden of forking paths" (Gelman & Loken 2013; Young & Holsteen 2017), where a single analysis represents only one possibility from a larger set of alternatives. Multiverse analyses address two key challenges in scientific research: the enhancement of transparency and the uncertainty in modeling (Young & Holsteen 2017). In our study, we estimated various plausible models which systematically vary in their characteristics. We then visualized the different estimated coefficients for the tracking effect and evaluated their robustness and consistency across the different specifications using specification curves (Simonsohn et al. 2020) and influence regressions (Young & Holsteen 2017).

In this section, we first describe our analytical decisions concerning data exclusion, handling of missing data, choice of statistical model, operationalization, and inclusion of variables, as well as reasonable alternatives to these decisions. Table 1 summarizes our choices, and Table A1 in the Appendix presents the different operationalizations of our measures. The specifications presented in italics represent our initial or preferred choices, i.e. the model we would have estimated if we had not conducted a multiverse analysis. It is important to note that "preferred" does not necessarily mean that these choices are superior in all cases; we will demonstrate that the different specifications are sometimes equally plausible. In the analyses section, we discuss the results of our initial specification and subsequently explore the robustness of these results using multiverse analyses.

3.1 Data

To test our hypotheses, we combined data from multiple school assessment studies, including the Progress in International Reading Literacy Study (PIRLS: 2001, 2006, 2011, 2016), the Trends in International Mathematics and Science Study (TIMSS) 4th grade (1995, 2003, 2007, 2011, 2015) and 8th grade (1999, 2007, 2011, 2015, 2019), and the Programme for International Student Assessment (PISA: 2000, 2006, 2009, 2012, 2015, 2018). These datasets were chosen for their suitability to our research question based on three key reasons. First, they provide information on students' educational achievements and family backgrounds, allowing us to measure homogeneity of learning environments and social segregation in schools. Second, the datasets encompass a range of countries exhibiting both tracked and non-/late-tracked education systems, enabling us to compare students from both types of systems. Third, the datasets cover both pre-tracking (4th grade in TIMSS and PIRLS) and post-tracking periods (8th grade in TIMSS and 15-year-old students in PISA), facilitating a difference-in-differences approach to comparing students before and after tracking (further details below).

As we are interested in differences between tracked and non-/late-tracked countries, we aggregated the student-level data on the country level for each cohort, using sample weights. Thus, country-level aggregates constitute the units of analysis in all models. Our final dataset includes 63 countries (with 876 country-cohort observations), with nine of them already implementing tracking in the 8th grade or earlier (see Table A2 in the Appendix for a detailed overview). While the data are generally well suited for addressing our research question, it is important to note four key points. Firstly, the studies employ different achievement measures: PIRLS and TIMSS assess skills and knowledge taught in schools, whereas PISA evaluates students' ability to apply these skills and knowledge. However, the differences in assessments do not conflict with our research design, since we are primarily interested in the distribution of test scores across schools and country aggregates, focusing on the similarity of students within schools.

Second, the studies differ in their sampling strategies. TIMSS and PIRLS select students from specific grades (e.g., 4th or 8th grade), whereas PISA tests students who are approximately 15 years old, regardless of their grade level. This poses a challenge for our analysis because students in the same school may have varying levels of achievement because they are in different grades. Moreover, some country samples include students from grades before and after they are tracked into different educational paths. Ignoring this would lead to inaccurate estimates of the potential effects of tracking because our outcomes measure the similarity of students within schools (see the measures section below). To address this challenge, we generated the country-level aggregates based only on students from one grade and excluded students who have already been tracked in countries that are not classified as "early-trackers." We selected the grade with the highest number of student-level observations (referred to as the "modal grade") and set a minimum requirement of 1,000 student-level observations. In cases where countries track students after grade 9, but grade 10 is the most common, we selected grade 9 under specific conditions: if a country tracks after grade 9, grade 9 represents at least 25 percent of the sample, and there are more than 1,000 student-level observations available for that grade.

A third challenge arising from the data is the heterogeneity within the sample of countries (refer to Table A3 in the Appendix). To ensure that any observed tracking effects do not stem from specific characteristics of the sample composition, we employed various samples for the multiverse analyses. We re-ran our models to include only economically comparable countries, according to the World Bank's classification by gross national income (World Bank 2021a). In particular, we proceeded with an alternate specification that omitted low-income economies, and a subsequent specification excluding both low-income and lower-middle-income economies. Moreover, in further model specifications, we excluded certain countries due to their selective student populations in secondary schools. This exclusion affected countries where only a restrictive segment of students proceeded to the 8th or 9th grade. For these nations, we removed those with a secondary school enrollment rate below 80 percent.

Finally, the surveys differ with regard to the achievement test content. PIRLS focuses exclusively on reading achievement, whereas TIMSS surveys math and science achievement. PISA covers all three domains with a focus on one specific domain per survey year. Consequently, our sample sizes vary depending on the respective outcome variable. Models estimating homogeneity in achievement are based on fewer observations compared to models that assess social segregation since information on students' social backgrounds has been collected in all three surveys.

3.2 Models

We estimate different models to test our hypotheses. Whether early between-school tracking leads to homogeneous learning environments (Hypothesis 1) is investigated in model M1a. In model M1b, we investigate whether early between-school tracking increases social segregation (Hypothesis 2). The question of whether selectivity moderates the effects of tracking is investigated in models M2a (outcome: homogeneous learning environments) and M2b (outcome: social segregation).

3.3 Measures

In the following section, we give a detailed description of the operationalization of our measures and their different variants. An overview of the various operationalizations is provided in Table A1 in the Appendix.

3.3.1 Outcome Variables

We assess two dependent variables: (1) homogeneity of learning environments and (2) social segregation. To assess (1) the homogeneity of learning environments, we used achievement scores (plausible values) and computed the average test score for each subject. In our initial model, we focused on math achievement scores because most country cohorts are available for this outcome. However, alternative specifications employ science and reading achievement scores. The (2) homogeneity of a school's socioeconomic composition (i.e., social segregation) was measured using two indicators: the number of books at home, and the highest educational level attained by parents. Given the higher number of missing values in the latter variable, we employed the books variable as the primary indicator in our main models. In our initial models, we used the intraclass correlation coefficient (ICC) for both achievement and social background in a given country and year. Higher values for these indicators signify a greater level of social segregation or homogeneity within the learning environment.

As an alternative specification, we used the dissimilarity index. The dissimilarity index compares the distribution of two groups and thus requires a binary variable. We varied the cut-off points to transform the outcome variables into binary variables (see Table A1 in the Appendix). The ICC and dissimilarity index have been calculated based on weighted data, incorporating sampling weights (total student weights) to ensure that the dependent variables represent nationally representative estimates. Table A4 in the Appendix provides information on the range and distribution of the dependent and independent variables.

3.3.2 Treatment: Early Between-School Tracking

For our treatment variable, we distinguished between early-tracking and non-tracking/late-tracking education systems. We operationalized the tracking indicator based on the grade at which tracking was first implemented. Education systems in which tracking occurs before the 8th grade are categorized as early-tracking systems, implying that students are already tracked in grade 8 or earlier. In an alternative model specification, we defined early-tracking systems as countries that track before the 9th grade. Integrated systems, i.e. non-tracking or late-tracking countries, are those with no tracking, or where tracking commences after the 8th or 9th grade. To operationalize the tracking grade, we referred to previous research (Strello et al. 2021) and additional public resources, such as the TIMSS Wiki and Eurydice (EU). An overview of the countries' timing of tracking is provided in Table A3 in the Appendix.

3.3.3 Moderator: School Selectivity

To measure the selectivity of an education system, we utilized data from the PISA school questionnaires. We used the questions asking the school principals to indicate whether admission to their school was based on a student's academic record or the recommendation of feeder schools, since this recommendation, in turn, is often based on students' academic performance. We calculated the percentage of secondary school students in a country and year attending a school that considered either one or both of these factors during the admission process. To ensure the national representativeness of the moderator variable, we calculated the student shares using sampling weights (total student weights). A value of 0 (or 100) indicates that no (or all) students attend selective secondary schools. For the model variants using TIMSS 8 survey data, we matched the school selectivity information obtained from the PISA data to the closest TIMSS country-year observation. Therefore, the moderation models are based on a smaller number of country observations as they include only those countries that participated in the PISA study. In the main analysis, we calculated the share of students attending a secondary school that considers a student's academic record in the admission decision. In alternative specifications, we calculated the share of students attending schools that consider both criteria or rely on the feeder school's recommendation. In all model variants, we mean-centered the school selectivity variable.

3.3.4 Covariates: GDP, Population Density, Private School Sector

The countries in our sample not only differ in terms of having integrated or tracked education systems, but they also exhibit other characteristics that might influence both the treatment and the outcome. To account for these differences, we estimated models with country and cohort fixed effects and thereby adjusted for all time-constant variations between countries and cohorts. Additionally, we included three time-varying covariates: GDP per capita ([42] et al. 2022; World Bank 2021b), population density (people per square kilometer; Teorell et al. 2022; World Bank 2021b), and the size of the private school sector. We measured the private school sector's size by the percentage of students in our analysis sample enrolled in secondary schools, which are funded at least fifty percent by private sources (obtained from the PISA school questionnaires). Similar to the procedure for the moderation variable, we matched this information to TIMSS data. As a consequence, models that control for the private school sector are based on a smaller number of country observations, since only countries participating in PISA are included. We used sampling weights (total student weights) to calculate the nationally representative student shares. We varied the inclusion or exclusion of these covariates. For a detailed discussion of the causal status of these variables and our rationale for including or excluding them in our models, refer to Table A5 in the Appendix.

3.4 Methods

To identify the effect of tracking on social segregation and homogeneity of learning environments, we employed a Difference-in-Differences (DiD) approach ([46] et al. 2018). The DiD approach allows us to estimate the treatment effect by comparing changes in the outcome between a treatment group (tracking countries) and a control group (countries with integrated school systems) over time, specifically between primary and secondary schooling. We focused on assessing the differences in the homogeneity of learning environments and the degree of social segregation before (among 4th-grade students) and after tracking potentially occurred (among secondary school students). The DiD approach relies on certain identifying assumptions that must be met to estimate an unbiased effect (Wing et al. 2018). One crucial assumption is the parallel trends assumption, which posits that, in the absence of treatment, the average differences in outcomes between the treatment and control groups would have remained constant over time. Thus, any divergence in the trends observed after treatment can be attributed to the treatment itself. Since we only have two time points, one before and one after the treatment, we cannot empirically test the plausibility of this assumption but must assume that it is met.

To account for potential cohort effects, we matched primary and secondary school students from approximately the same cohort (e.g., TIMSS 2011 4th-grade students and TIMSS 2015 8th-grade students). By observing the same student cohort during both elementary and secondary school, we aimed to mitigate cohort-specific confounding factors. It is important to note that some surveys can be matched with several others (e.g., TIMSS 2011 4th-grade students can be matched with both PISA 2012 students and TIMSS 2011 8th-grade students), resulting in multiple observations for certain countries in the dataset. To address this issue, we applied weights to the countries using the inverse of the country observations (1/n). This ensured that each country had an equal impact on the analyses, regardless of the number of observations. Furthermore, we conducted our models based on the pooled dataset, and incorporated country and cohort fixed effects, as well as country and cohort robust standard errors.

Table 1: Model specifications of the multiverse analysis

	Dimension		Specification	Model
I	Operationalization	SES	1 – Books at home	M1b-M2b
			2 – Parents' highest educational level	M1b-M2b
		Achievement	1 – Math	M1a-M2a
			2 – Reading	M1a-M2a
			3 – Science	M1a-M2a
		Homogeneity/Segregation indices	1 – ICC	M1-M2
			2 – Dissimilarity Index	M1-M2
		Selectivity	1 – Share of students in schools considering academic performance criteria in their admission decision	M2
			2 – Share of students in secondary schools considering the feeder school recommendations in their admission decision	M2
			3 – Share of students in schools considering academic performance criteria or feeder school recommendations in their admission decision	M2
		Early between-school tracking	1 – Tracking before 8th grade	M1-M2
			2 – Tracking before 9th grade	M1-M2
II	Covariates	GDP per capita	1 – GDP per capita included	M1-M2
			2 – GDP per capita excluded
		Population density	1 – Population density included	M1-M2
			2 – Population density excluded
		Private secondary schools	1 – Share of students in private schools excluded	M1-M2
			2 – Share of students in private schools included	M1-M2
III	Fixed effects	Cohort	1 – Included	M1-M2
		Country	1 – Included	M1-M2
	Cluster robust standard errors	Cohort	1 – Adjusted	M1-M2
			2 – Not adjusted	M1-M2
		Country	1 – Adjusted	M1-M2
IV	Subsample	Country subsample: secondary school enrollment	1 – All countries included	M1-M2
		Country subsample: secondary school enrollment	2 – Secondary school enrollment > 80 percent	M1-M2
		Country subsample: economy's income group (World Bank 2021b)	1 – All countries included	M1-M2
			2 – Low- and low-middle-income economies excluded	M1-M2
			3 – Low-, low-middle-, and upper-middle-income economies excluded	M1-M2
		PISA: grade selection	1–9th grade for countries tracking after 8th grade; modal grade for other countries	M1-M2
			2 – Modal grade	M1-M2

Note: The initial model specifications of the main models M1 and M2 are listed first (indicated with 1) and italicized, the alternative specifications of the multiverse analysis are listed subsequently.

Graph: Figure 1: Homogeneity of learning environments in primary and secondary school for tracked and non-/late-tracking education systemsNote: Homogeneity of learning environments was measured using the ICC of math achievement (see Table A1 in the Appendix for a detailed variable description). The figure is based on 522 observations (74 observations for early-tracking countries; 448 observations for late/no tracking countries).

4 Results

4.1 Descriptive Results

Before we turn to the DiD models, we initially examine the data by presenting the changes in the homogeneity of learning environments (Figure 1) and social segregation (Figure 2). These figures illustrate the mean scores for countries without (represented by solid lines) and with an early between-school tracking system (represented by dashed lines) before and after the implementation of tracking (primary vs. secondary school). Figure 1 reveals that the level of homogeneity of learning environments is similar between the two education systems during primary school (i.e., pre-treatment). Second, there is a minimal change in the homogeneity of learning environments between primary and secondary school in integrated education systems, i.e., non-tracking/late-tracking systems (solid line). Lastly, in contrast to integrated systems, early-tracking education systems demonstrate increased homogeneity of their learning environments during secondary school compared to primary school (dashed line).

Figure 2 demonstrates that both integrated and tracked education systems exhibit a comparable mean level of social segregation during primary school (i.e., pre-treatment). Integrated systems show little change in their average social segregation between primary and secondary school (i.e., post-treatment), while early-tracking education systems demonstrate increased social segregation during secondary education.

Taken together, these observations indicate systematic differences in the trends of both achievement homogeneity and social segregation between the two education systems. Tracked education systems experience an increase in both outcomes, whereas the more integrated education systems maintain stability.

4.2 Multivariate Results

We begin by discussing our initial specifications before turning to the multiverse analyses. Our first research question aims to investigate whether tracking contributes to a more homogeneous learning environment (Hypothesis 1) and higher social segregation of schools (Hypothesis 2).

Graph: Figure 2: Social segregation in primary and secondary school for tracked and non-/late-tracking countriesNote: Social segregation was measured using the ICC of books at home (books 1, see Table A1 in the Appendix for a detailed variable description). The figure is based on 794 observations (148 observations for early-tracking countries; 646 observations for late/no tracking countries).

Model 1 (Table 2) displays the DiD-estimate of early tracking (time x early tracking) on both the homogeneity of learning environments (M1a) and the social segregation (M1b). This estimate indicates whether the tracking of students leads to an increase in homogeneity of learning environments (or alternatively, in social segregation) when compared to integrated education systems.

Table 2: Model 1 – Tracking Effect

	M1a: Homogeneity of learning environments (ICC of Math achievement)		M1b: Social segregation (ICC of books at home)
	Coef.	Cluster-robust Std. Err.	Coef.	Cluster-robust Std. Err.
Time: secondary school (Ref.: primary school)	–.015	(.021)	-.026	(.015)
Tracking (Ref.: late-tracking countries)	(omitted)	(omitted)	(omitted)	(omitted)
Time x Tracking	.255***	(.051)	.038	(.031)
Population Density	.622**	(.182)	.021	(.072)
Time x Population Density	–.477***	(.060)	–.221***	(.039)
GDP per capita	–.763**	(.219)	–.034	(.044)
Time x GDP per capita	.621***	(.095)	.302***	(.067)
Intercept	.320***	(.031)	.145***	(.010)
Country fixed effects	✓		✓
Cohort fixed effects	✓		✓
Observations	522		794
Countries	58		60
Cohorts	12		18
R-squared	.78		.70
Within R-squared	.36		.096

Note: The outcome variable of model M1a is the ICC of math achievement (see Table A1 in the Appendix for a detailed variable description). The model's standard errors are adjusted for 58 country and 12 cohort clusters. The outcome variable of model M1b is the ICC of books at home (books 1, see Table A1 in the Appendix for a detailed variable description). The model's standard errors are adjusted for 60 country and 18 cohort clusters. The varying numbers of observations between the models result from the fact that the respective outcome variables were surveyed with different frequencies (see data section). * p <.05; ** p <.01; *** p <.001

Our findings regarding the homogeneity of learning environments suggest that tracking indeed contributes to an increase in the similarity of student achievement in schools (M1a. b =.255, rob. SE =.051). Specifically, we found that the similarity of student achievement in schools increases by a.25 unit change between primary and secondary school in early-tracking education systems. These effects are substantial considering that our dependent variable has a standard deviation of.152 and ranges from.04 to a maximum of.79 (Table A4 in the Appendix). Thus, our results support Hypothesis 1, indicating that tracking leads to a more homogeneous learning environment.

Graph: Figure 3: Predictive margins plot of the DiD-tracking effect on homogenous learning environment (model M1a)Note: The outcome variable of model M1a is the ICC of math achievement (see Table A1 in the Appendix for a detailed variable description). The model's standard errors are adjusted for 58 country and 12 cohort clusters.

Figure 3 visualizes the predicted DiD estimate of tracking. It demonstrates that in countries with an integrated education system, there is no substantial increase in the homogeneity of learning environments between primary and secondary school. Conversely, in tracked countries, the homogeneity of learning environments tends to increase as students progress to secondary school. Next, we examine the tracking effects on social segregation in model M1b. We find a positive effect of early tracking on social segregation between schools (b =.038, rob. SE =.031). However, it is important to note that there is a high degree of statistical uncertainty associated with this finding. The predicted impact, however, is not small, given that the predicted change constitutes about 57 percent of the standard deviation in social segregation (.066). Figure 4 illustrates that social segregation tends to increase in early-tracking countries, although this increase does not reach statistical significance. Thus, model M1b does not confirm Hypothesis 2.

The incremental change of the within R² before (see Table A6 in the Appendix) and after including the DiD tracking effect (see Table 2) suggests that tracking has more explanatory power for the homogeneity in achievement than for the social segregation of schools. The within R2 increases with the inclusion of the tracking effect by 166 percent to an R2 of.358 in model M1a (homogenous learning environments), and by 43 percent to an R2 of.096 in model M1b (social segregation).

Our second research question examines whether a stronger school selectivity based on prior achievement moderates the effect of tracking on the two outcome variables (Table 3). Contrary to our hypothesis, the results from Model M2a do not indicate that school selectivity moderates the effect of tracking on the homogeneity of learning environments (b = –.002, rob. SE =.002). Similarly, Model M2b does not suggest that school selectivity mitigates the positive effect of tracking on social segregation (b = –.000, rob. SE =.001). These findings collectively indicate that a higher degree of selectivity of secondary schools does not lead to increased homogeneity of learning environments within tracked systems, nor does it alleviate the potential increase in social segregation. Thus, our findings do not provide support for Hypotheses 3a and 3b.

Graph: Figure 4: Predictive margins plot of the DiD-tracking effect on social segregation (model M1b)Note: The outcome variable of model M1b is the ICC of books at home (books 1, see Table A1 in the Appendix for a detailed variable description). The model's standard errors are adjusted for 60 country and 18 cohort clusters.

Table 3: Model 2 – Moderating effect of tracking x school selectivity

	M2a: Homogeneity of learning environments (ICC of Math achievement)		M2b: Social segregation (ICC of books at home)
	Coef.	Cluster-robust Std. Err.	Coef.	Cluster-robust Std. Err.
Time: secondary school (Ref.: Primary School)	.008	(.038)	–.034	(.021)
Tracking (Ref.: late-tracking countries)	(omitted)	(omitted)	(omitted)	(omitted)
Time x Tracking	.255***	(.029)	.049	(.026)
School Selectivity	–.001	(.001)	–.000	(.000)
Time x School Selectivity	.002	(.001)	.000	(.001)
School Selectivity x Tracking	.002*	(.001)	.001	(.001)
Time x School Selectivity x Tracking	–.002	(.002)	–.000	(.001)
Private Secondary School	–.084	(.152)	–.167*	(.078)
Time x Private Secondary School	–.109	(.162)	–.019	(.097)
Population Density	.292**	(.073)	–.294	(.157)
Time x Population Density	–.551**	(.134)	–.243*	(.086)
GDP per capita	–.437**	(.113)	.284	(.162)
Time x GDP per capita	.677**	(.179)	.326*	(.114)
Intercept	.323***	(.031)	.176***	(.014)
Country fixed effects	✓		✓
Cohort fixed effects	✓		✓
Observations	360		618
Cluster: Country	34		36
Cohorts	12		18

Note: The outcome variable of model M2a is the ICC of math achievement (see Table A1 in the Appendix for a detailed variable description). The model's standard errors are adjusted for 34 country and 12 cohort clusters. The outcome variable of model M2b is the ICC of books at home (books 1, see Table A1 in the Appendix for a detailed variable description). The model's standard errors are adjusted for 36 country and 18 cohort clusters. The varying numbers of observations between the models result from the fact that the respective outcome variables were surveys with different frequencies (see data section). * p <.05; ** p <.01; *** p <.001

4.3 Multiverse Analyses

4.3.1 Tracking Effect

However, these results represent only one specification out of a large number of plausible specifications (Simonsohn et al. 2020; Steegen et al. 2016). Therefore, we employed multiverse analysis to assess the robustness of our findings across alternative specifications. Figure 5 displays the specification curves of the tracking effect in Model M1a, which examines the effect of early tracking on the homogeneity of learning environments. To generate the specification curves, we ran 5,760 models. These models constitute all possible combinations of model choices that can be derived from the different specifications described in Table 1. To simplify the visual presentation, we randomly selected 100 models from the total of 5,760 models (Simonsohn et al. 2020). The specification curves based on all specifications are presented in the Appendix (Figure A3 and A4). Each point in the upper part of the figure represents the estimated tracking effect for a particular specification together with the corresponding 95 percent confidence interval. The gray dots indicate statistically insignificant coefficients and black dots statistically significant coefficients. The black circle indicates the estimate of our initial model specification that we discussed in the previous section. The lower part of the figure presents the specifications of each model, such as the segregation indicator and covariates used.

The results of the multiverse analysis demonstrate the high robustness of our findings in terms of direction and statistical significance: The tracking effect on the homogeneity of learning environments was consistently positive across all model specifications and statistically significant in 99.8 percent of the models (p-value <.05 in 5,749 out of 5,760 models) (Figure 5). This indicates that the positive relationship between early tracking and the homogeneity of learning environments is a robust finding. Despite the overall robustness of the results, we observe some variation in the effect size. The operationalization of the outcome variable appears to have a notable influence on the magnitude of the tracking effect. Specifically, the dissimilarity index based on the 10 percent quantile cutoff seems to yield systematically lower tracking effects, whereas using the ICC as outcome is associated with systematically stronger tracking effects.

Graph: Figure 5: Specification curve of the tracking effect on the homogeneity of learning environments (model M1a)Note: The specification curve is based on a random sample of n = 100 out of 5,760 model specifications. The tracking effect of the initial model specification (M1a) discussed above is b =.255.

Graph: Figure 6: Specification curve of the tracking effect on social segregation (model M1b)Note: The specification curve is based on a random sample of n = 100 out of 3,840 model specifications. The tracking effect of the initial model specification (M1b) discussed above is b =.038.

Figure 6 illustrates the specification curve for the tracking effect on social segregation (model M1b). The possible combinations of different specifications led to a total of 3,840 models. Again, we drew a random sample of n = 100 for presentation. The statistical significance of the estimates is less consistent, with only 26 percent of the estimated coefficients being statistically significant (p-value <.05 in 999 out of 3,840 models). However, the results are consistent in terms of sign stability: 87.7 percent of the estimated coefficients are positive (b > 0 in 3,366 out of 3,840 models). Similar to the main analysis, we do not find clear support for the notion that tracking increases social segregation between schools. However, based on the robust positive tracking effect, we cannot easily reject Hypothesis 2. Similar to the specification curve described above, Figure 6 suggests that the operationalization of the outcome variable influences the effect size. Specifically, outcomes based on parental education appear to be associated with smaller tracking effects compared to outcomes based on the books at home variable. To further investigate the impact of different model variants on the tracking effects, we conducted an influence regression using Young and Holsteen's (2017) approach. The influence regression provides insight into how each specification, on average, affects the tracking effect on homogenous learning environment (coefficient: Figure A5 in the Appendix; p-value: Figure A6 in the Appendix) and social segregation (coefficient: Figure 7; p-value: Figure 8). Given the high robustness of the tracking effect on homogenous learning environments and large variance observed for social segregation, we focus on the influence regression of the latter outcome variable.

The magnitude of the estimated tracking effects on social segregation (Figure 7) and their statistical significance (Figure 8) are primarily influenced by the operationalization of the outcome variable. Specifically, models based on parents' education and the dissimilarity index tend to yield lower tracking coefficients and higher p-values compared to our initial model specification. Similarly, using a social segregation variable based on more than 200 books at home (books 6) decreases the estimated effect by.03. However, we find evidence of tracking having a segregation-enhancing effect when examining the segregation of students from low socioeconomic status (SES) families, as indicated by the operationalizations based on books 3 and books 4. This suggests that tracking might indeed increase social segregation, but not in a uniform way: It does not appear to increase segregation among children coming from the highest social strata (referred to as "elite-segregation") but to increase social segregation among children from disadvantaged social backgrounds. This would be consistent with a situation in which children from high- and middle-class backgrounds predominantly attend the higher track schools, whereas children from disadvantaged, low class backgrounds attend the lower track schools. Moreover, models based on a country sample that is more similar in terms of income levels lead to higher tracking estimates and lower p-values. For instance, when utilizing subsamples consisting of high-income economies, the tracking effect increases by.014 (Figure 7) and the p-value decreases by.12 (Figure 8).

Graph: Figure 7: Influence regression for regression coefficients of the tracking effect on social segregation (model M1b)Note: The influence regression is based on 3,840 model specifications. The tracking effect of the initial model specification (M1b) is b =.038.

4.3.2 Moderation Effect: School Selectivity

Next, we assessed the robustness of the moderating effect of school selectivity on the homogeneity of learning environments. Figure 9 presents the specification curve for this analysis (model M2a; refer to Figure A7 in the Appendix for the specification curve encompassing all specifications). From various specifications, we generated a total of 17,280 models, and a random sample of n = 100 models. The specification curves indicate that school selectivity does not moderate the tracking effect. Only approximately 43.8 percent of the model specifications (7,576 out of 17,280 models) exhibit a positive moderation effect of school selectivity on tracking, and only 11.9 percent of the specifications (2,056 out of 17,280 models) reach statistical significance.

Graph: Figure 8: Influence regression for p-values of the tracking effect on social segregation (model M1b)Note: The influence regression is based on 3,840 model specifications. The p-value of the initial model specification (M1b) is p =.239.

In the multiverse analysis in Figure 10 (see Figure A8 in the Appendix for the specification curve based on all specifications), we examined whether school selectivity moderates the tracking effect on social segregation. The different specifications yielded 11,520 models, again we used a randomly selected sample of n = 100 models for visualization. The specification curves suggest that school selectivity does not moderate the tracking effect on social segregation. The moderation effects are negative in about 40.7 percent of model specifications (4,688 out of 11,520 models), with statistical significance found in only about two percent of the model specifications (239 out of 11,520 models). Given the high robustness of these results, we do not delve further into the influence of the individual specifications. Detailed results of the influence regressions for the homogenous learning environment models (Figure A9 and A10) and the social segregation models (Figure A11 and A12) are provided in the Appendix.

In summary, the findings of the multiverse analysis do not support our hypotheses that school selectivity moderates the tracking effects. The results suggest that school selectivity neither reinforces the positive tracking effect on homogenous learning environments nor mitigates the tracking effect on social segregation.

Graph: Figure 9: Specification curve of the moderation effect (selectivity x tracking) on the homogeneity of learning environments (model M2a)Note: The specification curve is based on a random sample of n = 100 out of 17,280 model specifications. The tracking x selectivity effect of the initial model specification (M2a) discussed above is b = –.002.

Graph: Figure 10: Specification curve of the moderation effect (selectivity x tracking) on social segregation (model M2b)Note: The specification curve is based on a random sample of n = 100 out of 11,520 model specifications. The tracking x selectivity effect of the initial model specification (M2b) discussed above is b = –.000.

5 Discussion

In this study, we aim to contribute to the ongoing debate on early between-school tracking. We take a step back from the debate on the consequences of tracking for (inequality of) achievement, and investigate initial effects. Specifically, we address whether tracking leads to more homogenous learning environments and increased social segregation. We also explored whether these tracking effects are moderated by the selectivity of an education system. Despite the importance of these questions for understanding the empirical evidence on the effects of tracking on achievement inequality, they have received surprisingly little attention (however, see Strello et al. 2021; Engzell & Raabe 2023). To address these questions, we utilized a Difference-in-Differences approach by pooling data from PISA, PIRLS, and TIMSS covering a period of 24 years. Moreover, to ensure robustness, we conducted a multiverse analysis consisting of a wide range of plausible specifications.

While our study provides robust evidence that tracking leads to increased similarity in student achievement, the results regarding similarity in social status are more varied. Regardless of the sample composition, variable operationalization, and control variables, we consistently find a positive effect of tracking on the homogenization of learning environments across all 5,760 specifications, these effects are virtually always significant (99.8 percent). In contrast, the effects of tracking on social segregation have yielded mixed results. A common criticism is that students from privileged (disadvantaged) backgrounds tend to be overrepresented in higher (lower) tracks. Although the vast majority (87.7 percent) of our model specifications result in positive estimates (indicating increased social segregation due to tracking), only about a quarter of them reach statistical significance. Our multiverse analysis revealed that the effect size and level of statistical uncertainty strongly depend on the operationalization of the outcome.

Interestingly, the influence regression predicts that tracking has the strongest effect on social segregation among students from lower social backgrounds while having a less pronounced effect on students from middle or high social backgrounds. This suggests that tracking might work in a non-uniform way: It does not appear to increase segregation among children coming from the highest social strata, i.e., elite-segregation, but on the other end of the social strata, i.e., among children from disadvantaged social backgrounds.

With regard to school selectivity (i.e., the extent to which the sorting process is based on prior achievement), our results contradict the hypotheses. The vast majority of specifications does not indicate that selectivity influences the relationship between tracking and similarity in student achievement or between tracking and student social background. This is in line with the heterogeneous findings in the literature on selectivity based on within-country designs (c.p. Esser & Hoenig 2018; Jähnen & Helbig 2015; Lorenz et al. 2023).

How can we reconcile these findings with existing literature? The essential intention of between-school tracking – increased similarity among students – appears to be fulfilled. However, the majority of empirical evidence suggests that while tracking does not lead to an increase in (average) achievement, it does contribute to social inequality (Terrin & Triventi 2022). Three key factors should be considered in this context.

First, tracking may coincide with stratification (Betts 2011; Terrin & Triventi 2022), which implies that higher tracks generally offer better learning conditions, such as improved student-to-teacher ratios (Brunello & Checchi 2007: 795), or a self-selection of highly motivated teachers into more prestigious (and/or better paying) higher tracks (Betts 2011). This might also give rise to a stigmatization of students in the lower tracks ([34] & Ingham 2021). Consequently, when examining a country's average achievement, any advantages resulting from homogeneity of learning environments could be counterbalanced by the poorer conditions in lower track schools, resulting in a "zero-sum game" that does not translate into an overall increase in average achievement but rather amplifies social inequality in achievement. This strongly suggests the importance of investigating how tracking affects variance in achievement and its distribution across different levels to uncover potentially nonuniform effects.

Second, it is important to recognize that homogeneity of learning environments is not exclusive to between-school tracking. In integrated (i.e. non-/late-tracked) systems, within-school tracking or streaming may exist (Betts 2011; [7] et al. 2013). In such cases, although the overall student body in integrated schools may be heterogeneous in terms of academic ability, specific learning environments within the school, such as individual courses, can still be homogenous. Hence, it may be that homogenous learning environments are conducive to better learning outcomes ([8] et al. 2011), but that those can be implemented without early between-school tracking and its potential negative side-effects. Nevertheless, the limited evidence directly examining the impact of homogeneity of learning environments yields contradictory results (Chmielewski et al. 2013; Duflo et al. 2011; Matthewes 2021; [36] 2011), and it is not yet firmly established how homogeneity or heterogeneity influences the learning outcomes of students with different levels of ability through specific mechanisms.

Third, the debate on tracking often overlooks the fact that social segregation in schools is also prevalent in integrated systems ([24] et al. 2008). Segregation may emerge due to factors such as residential patterns or marketization of the educational sector ([19] & Siddiqui 2018).

5.1 Limitations

We acknowledge five main limitations of our study. First, our large-scale comparative approach investigates aggregated effects at the country level, which limits our ability to capture variation within countries. While this approach allows us to observe a large sample of education systems over time, we may miss important nuances and heterogeneity within each country.

Second, our study combined data from different datasets (PISA, TIMSS, and PIRLS), which vary in sampling schemes and measures ([30] 2015). PISA aims to measure general academic performance, while TIMSS and PIRLS aim to measure curriculum-based knowledge. However, we do not compare measures of achievement per se, but only similarity of students within schools. Consequently, we do not believe that the differences in measurements pose a fundamental threat to the validity of our design. Most importantly, these differences are unrelated to the treatment (tracked vs. integrated). These surveys also differ in their sampling strategies in two important ways. First, PISA samples students at the age of 15, while the other surveys target students from specific grades (PIRLS 4th grade, TIMSS 4th and 8th grade). To address this issue, we mitigated the potential impact by including only one grade from the PISA surveys. Second, the sampling methods differ in that PISA samples students at the school level, whereas TIMSS and PIRLS sample students within classes (within schools). This difference in sampling may have implications for our approach, as students within a class tend to be more similar in achievement than those across an entire school. Since our pre-treatment measures (primary school) are derived from either TIMSS or PIRLS, this might lead to a slight overestimation of student similarity before treatment in primary schools. However, this potential bias, if present, would actually lead to a more conservative test of the effects of tracking.

Third, when interpreting the theoretically unexpected (non-)effects of selectivity, two things should be considered. First, our selectivity measure is necessarily noisy. It is aggregated from countries which are potentially heterogenous at the level of states or districts. Furthermore, while our measure captures whether student achievement was considered it only provides limited variation in the degree to which achievement was considered. Though there likely are larger differences in the selectivity of early between-school tracking countries, our data indicates an overall high level of school selectivity in tracked education systems. To better understand the effects of selectivity on tracking, future research could further explore within-country differences (c.p. Esser & Hoenig 2018; Jähnen & Helbig 2015).

Fourth, our use of a Difference-in-Differences design, while appropriate for our analysis, relies on the parallel-trends assumption. While this assumption can never be explicitly tested (because it refers to a counterfactual outcome), approaches exist to make this assumption more plausible. However, these approaches rely on the existence of rich data (e.g., to gauge prior trends), or specific conditions (e.g., triple DiD) which are not available in our case. We argue, however, that there are reasons to believe that our approach is appropriate. Note the absence of (substantial) differences in the levels of homogenous learning environments or social segregation of schools when both groups are observed in primary school, but an impressive divergence when these groups are observed after a potential tracking policy was applied. Moreover, this result is robust when a number of potential confounders are directly (e.g., students in private schools) or indirectly controlled (e.g., country and cohort fixed effects).

Fifth, our measure of homogenous learning environments could be influenced by differential learning rates across tracks, since our data does not allow us to control this effect. However, our influence regression shows that there are only small differences in effect sizes between our different tracking definitions (which indirectly measures students' jointly spent time in secondary education). Thus, differences in learning rates should not drive our results.

Finally, multiverse analysis is a great approach to making data-analytic decisions transparent and to assessing the robustness of the results (Simonsohn et al. 2020; Steegen et al. 2016; Young & Holsteen 2017). However, it offers no remedy if the underlying design is flawed. In other words, if the models are misspecified, it does not matter if we estimate one or 100,000 models. Thus, one should still be cautious when interpreting the results. Nevertheless, we believe that a multiverse analysis can increase the reliability and the credibility of empirical investigations.

5.2 Conclusion

This study expands our knowledge of educational tracking by empirically examining two potential effects arising from tracking: (a) the promotion of homogeneous learning environments, and (b) the potential to exacerbate social segregation among schools. Furthermore, we investigated the moderating role of school selectivity to gain a better understanding of the nuanced effects of tracking. Our findings support the notion that tracking does, indeed, increase the homogeneity of learning environments. It can contribute to the social segregation of schools, apparently more at the lower end of the social strata. However, our analysis did not find evidence suggesting that higher selectivity in tracking systems moderates these associations.

In addition to these empirical contributions, we underscored the importance of systematically testing the robustness of findings across various model specifications. By using large-scale assessment data, we demonstrated how multiverse analyses can enhance transparency in data-analytical decision making and shed light on the potential implications for empirical results.

Data Note

We used survey data from the PISA, PIRLS, and TIMSS studies. The PISA data between 2000 and 2018 are available at https://www.oecd.org/pisa/data. The TIMSS and PIRLS data between 1995 and 2019 are available at the International Association for the Evaluation of Educational Achievement (IEA) website: https://timssandpirls.bc.edu/databases-landing.html.

The macro data of the tracking grade, the GDP, and the population density are available from the article's supplementary materials. GDP and population density were derived from the world bank: https://databank.worldbank.org/source/world-development-indicators.

The R scripts, and the STATA scripts for the data preparations and analyses are available at the following link: https://osf.io/cuq75/?view_only=dd09143528e043b6ba2147dcdcb71ba9.

Acknowledgments

We would like to express our gratitude to the three anonymous reviewers whose insights and suggestions substantially contributed to improving this manuscript. Our thanks also extend to our student assistants, Lisanne Strasser, Nakia El-Sayed, Marc Pelzer and Luisa Zecher, for their support. This study was conducted as part of a project financed by the Deutsche Forschungsgemeinschaft (DFG) under grant number 430266278.

References 1 Backes, S., & A. Hadjar, 2017: Educational Trajectories Through Secondary Education in Luxembourg: How Does Permeability Affect Educational Inequalities? Swiss Journal of Educational Research 39: 437–460. 2 Betts, J. R., 2011: The Economics of Tracking in Education. S. 341–381 in: E. A. Hanushek, S. Machin & L. Woessmann (Hrsg.), Handbook of the Economics of Education. Amsterdam: Elsevier. 3 Boone, S., & M. Van Houtte, 2013: Why Are Teacher Recommendations at the Transition from Primary to Secondary Education Socially Biased? A Mixed-Methods Research. British Journal of Sociology of Education 34: 20–38. 4 Boudon, R., 1974: Education, Opportunity, and Social Inequality: Changing Prospects in Western Society. New York, London, Sydney, Toronto: Wiley. 5 Breen, R. & J. H. Goldthorpe, 1997: Explaining Educational Differentials: Towards a Formal Rational Action Theory. Rationality and Society 9: 275–306. 6 Brunello, G. & D. Checchi, 2007: Does School Tracking Affect Equality of Opportunity? New International Evidence. Economic Policy 22: 782–861. 7 Chmielewski, A. K., H. Dumont & U. Trautwein, 2013: Tracking Effects Depend on Tracking Type: An International Comparison of Students' Mathematics Self-Concept. American Educational Research Journal 50: 925–957. 8 Duflo, E., P. Dupas & M. Kremer, 2011: Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya. American Economic Review 101: 1739–1774. 9 Dumont, H., D. Klinge & K. Maaz, 2019: The Many (Subtle) Ways Parents Game the System: Mixed-Method Evidence on the Transition into Secondary-School Tracks in Germany. Sociology of Education 92: 199–228. Engzell, P. & I. J. Raabe, 2023: Within-School Achievement Sorting in Comprehensive and Tracked Systems. Sociology of Education 95: 324–343. Erikson, R., & J. Jonsson, 1996: Can Education be Equalized?: The Swedish Case in Comparative Perspective. Boulder, Oxford: Westview. Erikson, R. & F. Rudolphi, 2010: Change in Social Selection to Upper Secondary School-Primary and Secondary Effects in Sweden. European Sociological Review 26: 291–305. Esser, H., 2016: Bildungssysteme und ethnische Bildungsungleichheiten. S. 331–396 in C. Diehl, C. Hunkler, & C. Kristen (Hrsg.), Ethnische Ungleichheiten im Bildungsverlauf: Mechanismen, Befunde, Debatten. Wiesbaden: Springer. Esser, H. & K. Hoenig, 2018: Leistungsgerechtigkeit und Bildungsungleichheit. Kölner Zeitschrift für Soziologie und Sozialpsychologie 70: 419–447. Esser, H. & J. Seuring, 2020: Kognitive Homogenisierung, schulische Leistungen und soziale Bildungsungleichheit: Theoretische Modellierung und empirische Analyse der Effekte einer strikten Differenzierung nach den kognitiven Fähigkeiten auf die Leistungen in der Sekundarstufe und den Einfluss der sozialen Herkunft in den deutschen Bundesländern mit den Daten der „National Educational Panel Study" (NEPS). Zeitschrift für Soziologie 49: 277–301. Esser, H. & J. Seuring, 2023: Was ist Dein Replicandum? Eine Antwort auf die Replik von Heisig und Matthewes (2022) zum Beitrag von Esser und Seuring (2020) über „Kognitive Homogenisierung, schulische Leistungen und soziale Bildungsungleichheit". Zeitschrift für Soziologie 52: 338–343 Gamoran, A., 1986: Instructional and Institutional Effects of Ability Grouping. Sociology of Education 59: 185–198. Gelman, A. & E. Loken, 2013: The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is no "Fishing Expedition" or "p-Hacking" and the Research Hypothesis Was Posited Ahead of Time. Department of Statistics, Columbia University 348. Gorard, S. & N. Siddiqui, 2018: Grammar Schools in England: A New Analysis of Social Segregation and Academic Outcomes. British Journal of Sociology of Education 39: 909–924. Hallinan, M. T., 1994: Tracking: From Theory to Practice. Sociology of Education 67: 79–84. Hanushek, E. A. & L. Wößmann, 2006: Does Educational Tracking Affect Performance and Inequality? Differences‐in‐Differences Evidence Across Countries. The Economic Journal 116, C63-C76. Heisig, J. P. & S. H. Matthewes, 2022: No Evidence That Strict Educational Tracking Improves Student Performance Through Classroom Homogeneity: A Critical Reanalysis of Esser and Seuring (2020): Zeitschrift für Soziologie 51: 99–111. Jähnen, S. & M. Helbig, 2015: Der Einfluss schulrechtlicher Reformen auf Bildungsungleichheiten zwischen den deutschen Bundesländern: Eine quasi-experimentelle Untersuchung am Beispiel der Verbindlichkeit von Übergangsempfehlungen. Kölner Zeitschrift für Soziologie und Sozialpsychologie 67: 539–571. Jenkins, S. P., J. Micklewright & S. V. Schnepf, 2008: Social Segregation in Secondary Schools: How Does England Compare with Other Countries? Oxford Revue of Education 34: 21–37. Krolak-Schwerdt, S., I. Pit-ten Cate, S. Glock & F. Klapproth, 2015: Der Übergang vom Primar- zum Sekundarschulbereich: Übergangsentscheidungen von Lehrkräften. S. 57–62 in: Ministère de l'Éducation nationale, de l'Enfance et de la Jeunesse. Université du Luxembourg (Hrsg.), Bildungsbericht Luxemburg 2015 Band 2: Analysen und Befunde. LeTendre, G. K., B. K. Hofer & H. Shimizu, 2003: What Is Tracking? Cultural Expectations in the United States, Germany, and Japan. American Educational Research 40: 43–89. Lorenz, G., S. Lenz & C. Rjosk, 2023: Effizienz und soziale Ungleichheit in strikt leistungsdifferenzierenden Bildungssystemen. Eine kritische Betrachtung des Model of Ability Tracking (MoAbiT). Zeitschrift für Soziologie: 404–424. Matthewes, S. H., 2021: Better Together? Heterogeneous Effects of Tracking on Student Achievement. The Economic Journal 131: 1269–1307. Meier, V. & G. Schütz, 2007: The Economics of Tracking and Non-Tracking. Ifo Working Paper 50, 32. Meinck, S., 2015: Computing Sampling Weights in Large-Scale Assessments in Education. Survey Methods: Insights from the Field 1–13. Neuenschwander, M. & H.-U. Grunder, 2010: Schulübergang und Selektion. Zürich, Chur: Rüegger. Oeltjen, M. & M. Windzio, 2019: Räumliche Segregation durch ungleiche Bildungskontexte? Kölner Zeitschrift für Soziologie und Sozialpsychologie 71: 651–675. Pfeffer, F. T., 2008: Persistent Inequality in Educational Attainment and its Institutional Context. European Sociological Revue 24: 543–565. Rix, J., & N. Ingham, 2021: The Impact of Education Selection According to Notions of Intelligence: A Systematic Literature Review. International Journal of Educational Research Open 2, 100037. Rodrigues, R. G., M. Meeuwisse, T. Notten, & S. E. Severiens, 2018: Preparing to Transition to Secondary Education: Perceptions of Dutch Pupils with Migrant Backgrounds. Educational Research 60: 222–240. Sacerdote, B., 2011: Peer Effects in Education: How Might They Work, How Big Are They and How Much Do We Know Thus Far? S. 249–277 in: E. A. Hanushek, S. Machin, & L. Woessmann (Hrsg.), Handbook of the Economics of Education. Amsterdam: Elsevier. Schütz, G., H. W. Ursprung & L. Wößmann, 2008: Education Policy and Equality of Opportunity. Kyklos 61: 279–308. Simonsohn, U., J. P. Simmons & L. D. Nelson, 2020: Specification Curve Analysis. Nature Human Behaviour 4: 1208–1214 Singapore Examinations and Assessment Board, 2023: About PSLE. https://www.seab.gov.sg/home/examinations/psle/about-psle Steegen, S., F. Tuerlinckx, A. Gelman & W. Vanpaemel, 2016: Increasing Transparency Through a Multiverse Analysis. Perspectives on Psychological Science 11: 702–712. Strello, A., R. Strietholt, I. Steinmann & C. Siepmann, 2021: Early Tracking and Different Types of Inequalities in Achievement: Difference-in-Differences Evidence From 20 Years of Large-Scale Assessments. Educational Assessment, Evaluation and Accountability 33: 139–167. Teorell, J., A. Sundström, S. Holmberg, B. Rothstein, N. Alvarado Pachon & C. M. Dalli, 2022: The Quality of Government Standard Dataset, version jan22. University of Gothenburg: The Quality of Government Institute. https://www.gu.se/en/quality-government Terrin, É. & M. Triventi, 2022: The Effect of School Tracking on Student Achievement and Inequality: A Meta-Analysis. Review of Educational Research 93: 236–274. Van de Werfhorst, H. G., 2019: Early Tracking and Social Inequality in Educational Attainment: Educational Reforms in 21 European Countries. American Journal of Education 126: 65–99. Van de Werfhorst, H. G. & J. J.B. Mijs, 2010: Achievement Inequality and the Institutional Structure of Educational Systems: A Comparative Perspective. Annual Review of Sociology 36: 407–428. Wing, C., K. Simon & R. A. Bello-Gomez, 2018: Designing Difference in Difference Studies: Best Practices for Public Health Policy Research. Annual Review of Public Health 39: 453–469. Wong, H. M., D. Kwek & K. Tan, 2020: Changing Assessments and the Examination Culture in Singapore: A Review and Analysis of Singapore's Assessment Policies. Asia Pacific Journal of Education 40: 433–457. World Bank, 2021a: The World by Income. https://datatopics.worldbank.org/world-development-indicators/the-world-by-income-and-region.html World Bank, 2021b: World Development Indicators. https://databank.worldbank.org/source/world-development-indicators Young, C., & K. Holsteen, 2017: Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis. Sociological Methods and Research 46: 3–40. Footnotes The bivariate association between school selectivity and homogeneity of learning environments (Figure A1) and social segregation (Figure A2) is provided in the Appendix.

By Maximilian Brinkmann; Nora Huth-Stöckle; Reinhard Schunck and Janna Teltemann

Reported by Author; Author; Author; Author

Maximilian Brinkmann, geb. 1990 in Remscheid. Studium der Volkswirtschaftslehre und Soziologie in Düsseldorf, Wuppertal und Groningen. Seit 2020 wissenschaftlicher Mitarbeiter an der Universität Hildesheim im DFG Forschungsprojekt „BiMiBi – Bildungssysteme und migrationsspezifische Bildungsungleichheit". Forschungsinteressen: Bildungssoziologie, Bildungssysteme, quantitative Methoden und Kausalanalyse.

Nora Huth-Stöckle, geb. 1989 in Aachen. Studium der Sozialwissenschaft mit den Fächern Soziologie, Politikwissenschaft und Volkswirtschaft an der Universität Köln. Studium der Soziologie an der Universität Duisburg-Essen. Von 2017–2020 wissenschaftliche Mitarbeiterin am GESIS Institut der Sozialwissenschaften in Köln im BMBF Forschungsprojekt „Solikris – Veränderung durch Krisen? Solidarität und Entsolidarisierung in Deutschland und Europa". Seit 2020 wissenschaftliche Mitarbeiterin an der Universität Wuppertal im DFG Forschungsprojekt „BiMiBi – Bildungssysteme und migrationsspezifische Bildungsungleichheit". Forschungsinteressen: Intergruppenbeziehungen, Vorurteile, Bildungsungleichheit Wichtigste Publikationen: Explaining immigrants' social distance towards natives: A multilevel mediation approach across immigrant groups in Germany. Social Science Research, 114, 2023: 102907 (mit E. Schlüter), Economic conditions and perceptions of immigrants as an economic threat in Europe: Temporal dynamics and mediating processes. International Journal of Comparative Sociology, 62(1), 2021: 56–82 (mit B. Heizmann).

Reinhard Schunck, geb. 1979 in Bonn. Studium der Sozialwissenschaften in Mannheim, Utrecht (Niederlande) und Bloomington, Indiana (USA). Promotion 2011 an der Bremen International Graduate School of Social Sciences, Universität Bremen. Von 2010 bis 2016 an der Universität Bielefeld. Von 2016 bis 2019 am GESIS-Leibniz-Institut für Sozialwissenschaften. Seit 2019 Professor für Soziologie an der Bergischen Universität Wuppertal. Forschungsschwerpunkte: soziale Ungleichheit, Migration, Familie und quantitative Methoden. Wichtigste Publikationen: Within- and between-cluster effects in generalized linear mixed models: A discussion of approaches and the xthybrid command. The Stata Journal, 17(1), 2017, 89–115 (mit F. Perales); Assortative Mating and Wealth Inequalities Between and Within Households. Social Forces, 102(2), 2023 454474 (mit P.M. Lersch); Pretty unequal? Immigrant-native differences in returns to physical attractiveness in Germany. Journal of Economic Behavior & Organization, 215, 2023, 107–119 (mit J. Hellyer, E. Hellriegel, J. Gereke).

Janna Teltemann, geb. 1980 in Uelzen, Studium der Soziologie an der Universität Bremen, von 2007–2016 wissenschaftliche Mitarbeiterin am Institut für empirische und angewandte Soziologie und am Sonderforschungsbereich 597 „Staatlichkeit im Wandel" an der Universität Bremen. 2012 Promotion an der Universität Bremen. Von 2016–2019 Juniorprofessorin und von 2019–2023 W2-Professorin an der Universität Hildesheim. Seit 2023 W3-Professorin für Bildungssoziologie an der Universität Hildesheim. Forschungsschwerpunkte: Bildungsungleichheit, Bildungspolitik, migrationsbedingte Bildungsungleichheit, International vergleichende Sozialforschung. Wichtigste Publikationen: Standardized Testing, Use of Assessment Data and Low Reading Performance of Immigrant and Non-Immigrant Students in OECD Countries. Frontiers 5, 2020 (mit R. Schunck); Education systems, school segregation, and second-generation immigrants' educational success: Evidence from a country-fixed effects approach using three waves of PISA. International Journal of Comparative Sociology 57, 2016: 401–424 (mit R. Schunck); Räumliche Segregation von Familien mit Migrationshintergrund in deutschen Großstädten: Wie stark wirkt die sozioökonomische Restriktion? Kölner Zeitschrift für Soziologie und Sozialpsychologie, 1, 2015: 83–103 (mit S. Dabrowski & M. Windzio).

Titel:	Achievement or Social Background? The Impact of Tracking on the Composition of Schools in an International Comparison.
Autor/in / Beteiligte Person:	Brinkmann, Maximilian ; Huth-Stöckle, Nora ; Schunck, Reinhard ; Teltemann, Janna
Link:	Volltext (PDF)
Zeitschrift:	Zeitschrift für Soziologie, Jg. 53 (2024-06-01), Heft 2, S. 164-185
Veröffentlichung:	2024
Medientyp:	academicJournal
ISSN:	0340-1804 (print)
DOI:	10.1515/zfsoz-2024-2014
Schlagwort:	PROGRAMME for International Student Assessment SOCIAL background SEGREGATION in education INTERNATIONAL schools CLASSROOM environment ACHIEVEMENT Subjects: PROGRAMME for International Student Assessment SOCIAL background SEGREGATION in education INTERNATIONAL schools CLASSROOM environment ACHIEVEMENT Difference-in-Differences Early between-school Tracking Multiverse Analysis Segregation Bildungssystem Differences-in-Differences Gliederung Multiverse-Analyse Language of Keywords: English; German
Sonstiges:	Nachgewiesen in: DACH Information Sprachen: English Alternate Title: Leistung oder Herkunft? Auswirkungen von Tracking auf die Zusammensetzung von Schulen im internationalen Vergleich. Document Type: Article Author Affiliations: 1 = University of Hildesheim, Universitätsplatz 1 31141 Hildesheim, Germany ; 2 = University of Wuppertal, Gaußstraße 20 42119 Wuppertal, Germany Full Text Word Count: 12038

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.