A Comparative Study of Lexical Bundles in Accepted and Rejected Applied Linguistic Research Article Introductions

Multi-word expressions known as ‘ lexical bundles ’ are vital components of effective writing, particularly in producing high-quality research articles that meet the standards of reputable journals. Numerous studies have highlighted the use of lexical bundles in various sections of research articles, including introductions, which serve as the foundation for the research rationale. However, little attention has been paid to the comparison between accepted and rejected manuscripts. Therefore, this study aims to investigate whether the use of lexical bundles differs between the introduction sections of accepted and rejected research articles in applied linguistics. A total of 15 introductions for each data group were analyzed under the frameworks of lexical bundle structures and functions. A descriptive quantitative analysis was employed to assess the statistical differences and to explore the structural and functional patterns of the bundles. The results of the study reveal that while both accepted and rejected introductions in applied linguistics tend to rely on noun phrase-based and research-oriented lexical bundles, there are significant differences in the frequency and variety of these bundles. Specifically, the introductions that were accepted for publication had a higher number and a wider variety of lexical bundles than the rejected counterparts. Additionally, the two data sets exhibited different subcategories of lexical bundles in many cases. These findings have important implications for future research on the use of lexical bundles in academic writing and may help writers improve their chances of acceptance by journal reviewers.


INTRODUCTION
Research publication has been integral to the academic realm; therefore, writing a research article (RA) and having it published are indispensable skills that academics seek to possess. RA itself is known as a key genre to facilitate knowledge claim dissemination (Herrando-Rodrigo, 2014). Therefore, the ability to produce and publish high-quality RAs is highly valued and continually generates immense interest. This is because a strong publication record is crucial for increasing individual recognition and for validating institutions as prestigious (Flowerdew & Habibie, 2021;Goldman et al., 2004;Suherdi et al., 2020Suherdi et al., , 2021. It is, therefore, common to have reputable journals as the target of RA submissions. Scholars tend to publish their RAs in reputable journals indexed by major bibliographic databases (Suiter & Sarli, 2019) due to their higher accessibility, readership, and citation rates. This can indicate the credibility and impact of the researchers' work (Kurniawan, Dallyono, et al., 2019). In order to achieve the benefits of publishing RAs, it is essential to consider the conventions for presenting knowledge that is both understandable and acceptable to the intended audience (Metoyer-Duran, 1993;Nagano, 2015).
Structure-wise, RA is typically constructed by an abstract, followed by the introduction, method, results, and discussion (IMRD) sections (Swales & Feak, 1994). Each section has been regarded as an independent genre because it is considered to have its own norms (Kurniawan, Lubis, et al., 2019) and to constitute a set of communicative events with distinct communicative purposes (Kanafani et al., 2022;Swales, 1990). Among sections in RA, the introduction section is perceived as the most challenging and time-consuming section to write, even for experienced authors (Bajwa et al., 2020). This challenging nuance is related to the central role of an introduction in conveying research novelty and significance (Setiawati et al., 2021;Swales, 1990). In addition, the communicative purpose that the introduction section has to comply with is to attract the readers' interest to the topic under investigation through the presentation of the research rationale, starting from the general research background to the specific research questions or hypotheses (Swales & Feak, 1994). The authors are thus left with "an unnerving wealth of options" in deciding the amount of information to include and how to present it most directly and appealingly (Swales, 1990, p. 137). In the context of publishing research articles, a well-crafted introduction is essential in convincing reviewers of the article's worthiness for publication (Lim, 2012;Luthfianda et al., 2021).
The construction of an introduction is closely tied to the choice of expressions, as it is essential to fulfill its communicative purpose and meet the expectations of journal reviewers. This is consistent with the nature of the genre, which requires specific expressions to effectively communicate (Ellis et al., 2008;Hyland, 2012). The use of formulaic expressions is also necessary for fluent writing, as it demonstrates the author's ability to produce a discourse that is familiar to their community (Coxhead & Byrd, 2007;Wray, 2002). In the case of writing an RA introduction, the appropriate use of expressions signals the author's competence in understanding the generic practice of RA writing. One such expression commonly used to construct a well-crafted introduction is a lexical bundle (LB). LBs are defined as extended collocations, consisting of a sequence of three or more words that tend to co-occur in discourse (Biber et al., 1999). They are viewed as building blocks that aid the construction of cohesive discourse (Biber et al., 1999;Cortes, 2013) and serve as signaling units that guide the audience to anticipate the upcoming information in the discourse (Nesi & Basturkmen, 2006).
The characteristics of LBs that set them apart from other expressions are defined by their frequency, fixedness, idiomaticity, and structural status (Biber et al., 1999;Cortes, 2004). LB occurrences in a set of discourse are much more frequent compared to pure idioms that have rare or no occurrences. Their fixedness is also frequencyoriented, in which only word combinations that fit the determined frequency criteria will be qualified as LBs, regardless of their alternate forms. LBs exhibit a certain degree of flexibility that distinguishes them across different discourses. While some LBs tend to occur in fixed sequences, such as 'there's a lot of', which is more common in spoken discourse, others are typically found in written discourse and take the form of fixed frame patterns composed of function words and intervening variable slots that are filled by content words. (e.g., 'in the (case/context/field) of') (Biber, 2009). Most LBs are, nevertheless, non-idiomatic and structurally incomplete. Instead of having a complete structure, most of them bridge two structural units: they start at a phrase or clause boundary and end as the first elements of the second unit. Hereby, appraised as the leading work on LBs studies, Biber's (2009) frameworks are oftentimes employed as the guide in which many ensuing studies were grounded. One of the most notable frameworks is retrieved from Biber et al. (2004), who classified LBs in university teaching and textbooks based on their structures and functions. The structural classification is presented in Table 1. Regarding the functions, Biber et al. (2004) proposed three major categories, namely stance expressions, discourse organizers, and referential expressions. Derived from this framework, Hyland (2008a) modified the functional categorization since his study centered on the academic proses (e.g., research articles, doctoral dissertations, and master's theses). This newly modified categorization has since been viewed as another prominent framework and has been widely adopted in similar studies. The frameworks are shown in Table 2. Table 2. Functional classification of LBs (Biber et al., 2004;Hyland, 2008b Utilizing the above framework, numerous studies have examined the use of LBs across different sections of RAs, such as abstracts (Shahriari et al., 2013), introductions (Jalali & Moini, 2014), discussions (Jalali & Moini, 2018), introductions-methodsresults comparison (Shahriari, 2017), abstracts-conclusions comparison (Shahmoradi et al., 2021), and the entire sections (Budiwiyanto & Suhardijanto, 2020). As can be seen, studies specifying the exploration of LBs in the introduction section remained inadequate. This contradicts the assumption mentioned earlier that the introduction section is a challenging part of the RA because it plays a crucial role in conveying the significance of the research and in increasing the chances of publication. The most closely related study, conducted by Jalali and Moini (2014), examined introduction sections in medical research and found that LBs were predominantly constructed using phrasal (noun phrases). The related study that was mentioned earlier investigated the structures of LBs in introductions of published hard science research articles. It is interesting to consider the possibility that there might be differences in the manifestation of LBs (both structurally and functionally) between the introduction sections of accepted and rejected RAs. However, this issue has not been explored in previous research.
Hence, this study sought to analyze and compare LBs in the introduction section of accepted and rejected soft science RAs, specifically in applied linguistics. To achieve this, this study aimed to answer two research questions: (1) How do LBs' manifestations structurally differ between accepted and rejected applied linguistics research article introductions?; and (2) How do LBs' manifestations functionally differ between accepted and rejected applied linguistics research article introductions? By addressing these two guiding questions, this study aims to better understand the use of LBs in the introduction sections and how they can increase the likelihood of RA publication in the targeted journals.

Research Design
This study employed a descriptive quantitative design to arrive at its intended purpose, that is to analyze and compare the manifestation of LBs between accepted research article introductions (ARAIs) and rejected research article introductions (RRAIs). This quantitative design was employed to address both RQs, namely (1) How does LBs manifestation structurally differ between accepted and rejected applied linguistic research article introductions? and (2) How does LBs manifestation functionally differ between accepted and rejected applied linguistics research article introductions? As part of a quantitative design, a Z-test analysis was employed to calculate LBs' proportion differences between the two groups to address the following hypotheses: • H0 = There is no significant difference in the proportion of LBs occurrences in the two corpora. • H1 = There is a significant difference in the proportion of LBs occurrences in the two corpora. Furthermore, the alpha for the Z-test was set at 0.05 (p-value > 0.05 = H0 is accepted). The results of the analyses in this study are presented in the form of figures, tables, explanations, and excerpts to facilitate a better understanding of the findings.

Corpora
The data used in this study comprised 30 research articles published between 2018 and 2021, selected for their recent publication dates. All articles were sourced from the Indonesian Journal of Applied Linguistics (IJAL), a Scopus-indexed journal in Indonesia, and were obtained with the permission of the journal administrator and consent from the authors. The reason for selecting RAs from IJAL was based on the fact that it is a reputable international Scopus-indexed journal on applied linguistics in Indonesia, as reflected in its SJR score of 0.297 in 2021. The 30 RAs were divided into two corpora consisting of 15 introduction sections each, as shown in Table 3. The number of RAs used in this current study was deemed sufficient because the resulting words amounted to 44,352. This number falls in the middle of the corpus size that Chen and Baker (2016) examined, which was from 26,000 words to 88,000 words. Furthermore, a much greater corpus size was improbable because obtaining consent from the authors of rejected articles was challenging.

Instrument and Data Analysis
The LBs in this study were identified using a computer software program, AntConc 3.5.9 (Anthony, 2020), following Hyland and Jiang (2018). AntCont, in our view, is one of the best-designed and easiest-to-use corpus tools available. Nevertheless, the same set of chunks often recurs numerous times, making manual filtering a necessity.
Due to the program's requirement, each RAI was separated from the selected RAs to be then saved in a plain text format. The ARAIs files were coded A1-A15, while the RRAIs were coded R1-R15. Each corpus was then imported into the software for further analysis using the 'N-grams' feature. As the corpora used in this study were considered small, the cut-off criteria were set by following the criteria proposed by Biber and Barbieri (2007), i.e., the word combinations should have a minimum frequency of three and must be distributed across a minimum of three different texts to avoid the authors' idiosyncrasies. Furthermore, this study focused on the combinations of three-and four-word bundles as they had been tested as manageable and able to display relevant expressions (Lee, 2020). After the lists of LBs were generated, they were classified structurally according to Biber et al.'s (2004) framework (See Table 1) and functionally according to Hyland's (2008b) framework (See Table 2) to guide the comparison.

Data Trustworthiness
In order to reduce subjectivity, this study adopted the inter-coder reliability assessment in analyzing the data. After having the list of corpora used in RAI, we looked for three-and four-word clusters. Each selected corpus was coded according to its function and grammatical structure after the findings were imported into a spreadsheet. We and a trained student coder worked independently to check and code the LBs in the two corpora. The assessment resulted in 95.48% agreement. The disagreements were then discussed through successive passes and aligned to refine the coding results.

RESULTS AND DISCUSSION
This section presents the results of LB identification from each of the 15 selected RAIs in both accepted and rejected RAs, followed by discussions of the results of their structural and functional analyses to answer the formulated research questions. The discussion follows as an attempt to unravel the implications of LB's manifestation in both groups.
Identifying LBs using AntConc's N-grams feature resulted in different numbers of LBs identified in each corpus. In the ARAIs corpus, the program generated lists of 136 three-word and 19 four-word bundles. In contrast, the RRAIs corpus was found to have fewer LBs, with only 89 three-word and 12 four-word bundles identified.
However, many of the initial bundles in both corpora were found to be repetitive, and thus some of the bundles were manually eliminated using exclusion criteria proposed by Salazar (2014). The criteria adopted for exclusion included: a. Fragments of other bundles. This criterion eliminates short bundles that are incorporated into longer bundles with the same or similar frequency. For example, a three-word bundle 'on the other' and a four-word bundle 'on the other hand' occurred eight times in the corpus. The concordance lines showed that in all instances, 'on the other' occurred as a fragment of 'on the other hand.' Therefore, 'on the other' was excluded. b. Bundles ending in articles. This criterion excludes longer bundles ending in articles if they are already part of shorter bundles. For instance, four-word bundles 'in accordance with the' was an extended bundle of 'in accordance with', and both had the same frequency of occurrence. Since the article 'the' did not provide additional information to the bundle, 'in accordance with the' was disregarded. c. Bundles composed exclusively of function words that have no textual evidence or semantic function, such as 'has not been.' d. Bundles with random numbers, such as 'two or more.' e. Meaningless bundles, such as 'et al in. ' As a result of the aforementioned exclusion step, the final lists comprising 121 bundles in ARAIs and 78 bundles in RRAIs are provided in detail in Table 4.  Table 4 reveals an apparent disparity in LB occurrences, with ARAIs showing a significantly higher number of bundles compared to RRAIs, particularly in three-word bundles. This result suggests that there is a discernible difference between ARAIs and RRAIs, at least in terms of the number of LB occurrences. Additionally, Table 5 lists the three most frequent three-and four-word bundles in this study to provide a quick overview of LBs in each corpus. Although Table 5 shows that the bundles extracted from the ARAI and RRAI data sets are relatively distinct, both data sets yielded the same finding in the most frequently occurring three-word bundles, namely, 'the use of', with a similar frequency. This result indicates that there is some similarity in the patterning of both ARAIs and RRAIs.

LBs Structures in Accepted and Rejected RAIs
This subsection unveils the comparison of LBs structures in both data groups, aiming to answer the first research question. It was found that LBs in this study broadly fit the structural taxonomy in Biber et al. (2004). However, some of the bundles need more specific categories to fall under. Hence, additional structural categories were included, i.e., bundles incorporating adjectival phrases, adverbials, be+adjective/adverb+to structures, and bundles beginning with conjunctions, taken from Nasrabady et al. (2020). Figure 1 represents the structural category distribution of LBs in this study. Most LBs in the two corpora were of the phrasal types rather than the clausal types, which was in line with Jalali and Moini's (2014) findings. As Figure 1 demonstrates, bundles containing NP fragments accounted for 43.23% of the total frequency in ARAIs and 37.96% in RRAIs. PP fragments came in second, with more than 20% frequencies in both groups. Moreover, these findings were unsurprising considering that, as shown in Table 5, the three most frequent bundles in both corpora were structurally built by NP and PP fragments, confirming Jalali and Moini's (2014) findings.
On the other hand, structural categories from Nasrabady et al. (2020), i.e., adjectival phrases, adverbials, be+adj/adv+to structures, and bundles beginning with conjunctions, appeared low in percentages. These bundle forms functioned alike in the two corpora. Adjectival phrases (e.g., 'related to the') served to modify the preceding noun. Bundles began with conjunctions (e.g., 'and so on', 'and the students') indicating additional information. The least frequent type, be+adj/adv+to structure (e.g., 'be able to'), denoted ability. Lastly, adverbials that had the highest percentage among these remaining substructures served as a verb modifier (1) or as a conjunctive adverb (2).
(1) In Japanese learning in Indonesia, especially in grammar teaching, the instructor generally only explains the kaku-joshi based on the external structure (shinsou-kouzou) by solely underlining its parallel in Indonesian. [A12] (2) …On the other hand, as a relation of power, the explanation stage tends to portray the outcomes of the social struggle in determining the power within a certain society.
[R12] Table 6 illustrates that bundles with NP fragments were the most frequent structure, and there was a significant difference in frequency proportion between the two corpora, with a gap of 5.27%. However, the Z-test revealed that the proportion difference was not significant, as indicated by the p-value of > 0.05. It indicated that, in general, the authors of all RAIs in this study similarly used many noun phrases in their manuscripts. However, the Z-test calculation revealed that a significant proportion of differences existed in the bundles with VP fragments and be+adj/adv+to structure (see Table 6, in bold). Thus, these two structures rejected the H0, while the rest accepted it. The significant difference in be+adj/adv+to structure most likely happened due to its complete absence in RRAIs. To provide a more detailed comparison, we further examined the substructures of LBs, as shown in Table 7. We also made several adjustments to the classification to ensure greater clarity and organization. The first adjustment followed Berūkštienė's (2018) and Pearson's (2021) studies, i.e., the renaming of subcategory (connector + 3rd person pronoun + VP fragments into pronoun/noun + VP fragments. Another adjustment followed Nasrabady et al. (2020), who found adverb clauses and adjective clauses as the new substructures of bundles incorporating DC fragments. However, bundles incorporating adjectival phrases, adverbials, be+adjective/adverb+to structures, and beginning with conjunctions were excluded from the following analysis since they had no substructures.  The Z-test, as seen in Table 7, showed that significant differences emerged in the bundles of VP fragments with WH-question substructure, DC fragments with adverbial clause, NP fragments with NP+of-phrase fragments, and NP fragments with NP+other post modifier fragments, indicated by the p-values of all the mentioned substructures that were below 0.05. Thus, the H0 was rejected for these two structures. Table 6 highlights an interesting aspect of VP fragments, wherein the WHquestion substructure emerged as the rarest, with the smallest gap in the percentage of occurrence (4.26% gap). This substructure only appeared in RRAIs and was used by the authors to structure the research questions. This type was absent in ARAIs as the authors stated their questions more variously, such as using how or why questions whose frequencies did not meet the cut-off criteria in this study. In contrast, RRAIs showed a relatively uniform type of question (i.e., 'what is the'); and thus, it was detected as a bundle. Furthermore, the remaining three substructures showed a much higher percentage of occurrences albeit insignificant proportion difference. The first subcategory, pronoun/noun + VP fragments, mainly indicated (non) existence (3) or research objectives (4).

VP-based bundles
(3) ... they decided to give up the course because there is no interesting material for them. [R14] (4) This study aims to reveal the clarity of the translations of the Qur'anic imperative verses that have certain pragmatic meanings. [A2] An interesting discrepancy lay in the last two substructures, i.e., bundles with passive and non-passive verbs. Previously, it was discovered that VP-based bundles in academic discourse tended to be in passive verb structures to highlight the result of an action rather than the doer (Biber et al., 1999;Jalali & Moini, 2014;Kwary et al., 2017). Consistent with this, the ARAIs in this study exhibited more bundles with passive verbs.
(5) Hence, analogical reasoning is considered to be a learning strategy that incorporates daily life context into narrative texts to promote students' critical thinking skills. [A10] (6) In this respect, language is defined as a resource for making meaning that evolves to serve certain human needs depending on the context it is used. [A3] (7) A specific meaning of repetition usually refers to the act of reproducing the linguistic elements of the previous phrase (words and grammar) in precisely the same manner. Bundles with passive verbs in RRAIs mainly consisted of 'be + used' (e.g., 'is used to'), while in ARAIs, they varied (5,6). Additionally, in RRAIs, non-passive verbs were more frequent and were often employed to define (7) or cite relevant studies (8)

DC-based bundles
The adverbial clause fragments were the only DC-based substructure with a significant proportion difference. This difference may have arisen due to the fact that this substructure was detected in only one corpus, namely ARAIs. The use of adverbial clauses can create the complexity of texts which in some way belongs to one of the standard features of scientific writing (Schleppegrell, 2004). Adverbial clauses, in particular the ones using connectives as shown in (9), helped facilitate the linkages among ideas for the establishment of coherent texts (Fang, 2006). The following substructure, adjective clause fragments, accounted for an equal percentage as adverbial clauses in ARAIs, but were slightly more prevalent in RRAIs. This type was marked by the use of relative pronouns (e.g., 'that can be', 'which in turn'). On one side, to-clause was the most used DC substructure in both corpora and showed up as a subordinate clause in the form of an infinitive clause to explain an action (e.g., 'to analyze the', 'to refer to'). Lastly, that-clause was a subordinate clause beginning with the word 'that' to declare a proposition (e.g., 'that there are', 'that it is').

NP-based bundles
According to the NP-based bundle substructure analysis, the NP + of-fragments bundle was the most prevalent substructure, with over 70% of occurrences in both datasets. Table 5 also revealed that this type of bundle dominated the top three most frequent bundles, with 'the use of' being the most prevalent bundle in both groups. This was consistent with the findings of Gil and Caro's (2019) study that 'the use of' became the most commonly used bundle in linguistic RAIs. Overall, NP + offragments were used to identify attributes, events, quantities, or specific entities, which will be elaborated further in the subsection of LB's functional analysis.
The second most common NP-based bundle substructure was other NP expressions that typically signified the research topic (e.g., 'teaching and learning process', 'a foreign language'). The least frequent type was NP + other post modifier fragments that in ARAIs commonly contained a preposition (10). Meanwhile, in RRAIs, the only recurring bundle of this type was 'the fact that.' (10) The relationship between language and law does not only include the interpretation of legal language but also the aspects related to the law in practice such as proof, prosecution, renunciation, and final decision. [A15] The prevalent use of NP-based bundles in RAIs indicates a strong tendency toward nominalization. This outcome is not surprising, given that nominalization is a distinct feature of academic writing (Biber & Gray, 2010) and can effectively encapsulate dense knowledge in RAIs (Mehrabi et al., 2018). Moreover, nominalization aids in presenting complex ideas concisely and coherently, which is essential in academic writing.

PP-based bundles
PP expressions emerged as another common substructure in this study and were frequently used to specify the context of the research topic (e.g., 'in the classroom', 'of a text'), to connect authors' argument (e.g., 'in addition to'), or to show the role of something through the use of 'as' (e.g., 'as a way'). All these forms were relatively similar between the two groups. Comparative expression substructure, on the other hand, was only performed by 'as well as' in both corpora. However, the overall frequency was higher in ARAIs, making it the second most commonly used threeword bundle. 'As well as' was utilized to mark two elements as equally important.

LBs Functions in Accepted and Rejected RAIs
This subsection is aimed to answer the second research question by addressing the comparison of LBs functions in the two groups of RAIs. To begin with, the final lists of LBs were functionally classified following Hyland's (2008b) framework. The result, as presented in Figure 2, shows that more than half of both corpora consisted of research-oriented bundles. This partly coincided with several other studies (e.g., Khamkhien, 2021;Shirazizadeh & Amirfazlian, 2021;Yuliawati et al., 2021), according to which research-oriented bundles constituted about half of the bundles in the linguistic RAs. This result implied that the authors of the two RAI groups in this study shared a parallel intuition to focus on presenting the background knowledge and their research objectives in RAIs. That is to say, in their introductions, the authors tended to minimize their presence to retain impersonality and expose the readers to the knowledge construction instead (Candarli & Jones, 2019;Charles, 2006;Hyland, 2008b).
Similar to the previous section, the subfunctions of LBs were analyzed to obtain a clearer understanding of their occurrence in the two corpora, as presented in Table  8. Additional frameworks were also employed to analyze bundles that did not fit Hyland's (2008b) framework. The additional subfunctions included doubling, exemplifier, and questioning from Nasrabady et al. (2020), along with grouping, citation, generalization, and objective from Salazar (2014). Furthermore, Salazar's (2014) division of Hyland's (2008b) transition and resultative signal subfunctions into additive, comparative, inferential, and causative was also adopted.  Table 8 highlights apparent statistical differences between ARAIs and RRAIs in the distribution of LB functions and subfunctions. Every subfunction had different occurrences in both ARAIs and RRAIs. The following subsections delve into the results in more detail.

Research-oriented bundles
To initiate the analysis, it is crucial to examine the two most frequent researchoriented subfunctions, namely process and description, which showed significant differences in proportion based on the Z-test. Upon examining the percentage of occurrences, it was found that ARAIs exhibited a higher frequency of description bundles (34.65%) compared to procedure bundles (32.68%). In contrast, RRAIs displayed a higher frequency of procedure bundles (47.75%) compared to description bundles (25.26%). This suggests that authors of rejected research articles tended to use more expressions to indicate procedures as opposed to descriptions, in contrast to those of accepted articles. Some prior studies analyzing published RAs reported that procedure bundles were more prevalent than descriptions (e.g., Pourmusa, 2014;Shirazizadeh & Amirfazlian, 2021). However, this contradiction may have arisen because this study specifically focused on RAIs instead of the entire RA sections.
In this study, procedure bundles were more prominent in RRAIs to demonstrate events (11) or actions (12). In contrast, the authors of ARAIs perceptibly preferred to depict their topic using description bundles to indicate research content (13), attribute (14), or existence (15). This approach was understandable, considering that the authors had a chance to illuminate their research topic in RAI; and hence, they chose to emphasize its salient description to make it stand out and appear as investigation worthy. The Z-test revealed that there were two additional subfunctions with significant differences in proportion, namely location and topic bundles. Location bundles took the least proportion in both corpora (only 1.69% in ARAIs, and none in RRAIs). They signified the context in which the topic was placed (16). Meanwhile, topic bundles that appeared about 8% more frequently in ARAIs indicated the authors' intention to highlight the main topic of their study using other NP expression structures, as mentioned in the subsection 3.1.3.
(16) In other words, voice is co-constructed or inherently involves the role of others to produce it in a text. [A6] The last three subfunctions were relatively sparse in both corpora. To single out one entity or denote clusters, the authors of RRAIs favored more quantification bundles in which numerical expressions were more explicitly stated (e.g., 'one of the', 'a number of'), while ARAIs authors possessed more grouping bundles (e.g., 'a part of', 'a set of'). Lastly, doubling was the bundles that referred to two things in the research. The forms were identical in both corpora (e.g., 'learning and teaching', 'teachers and students').

Text-oriented bundles
According to the Z-test result, there were several text-oriented subfunctions with significant proportion differences. The first subfunctions were inferential bundles, which according to Salazar (2014), function to underscore the interpretations or conclusions drawn from the information in the study. Inferential (17) accounted for 2.11% of the text-oriented bundles in RRAIs, but none was found in ARAIs. ARAIs tended to adopt more causative bundles (18) to clarify cause-effect relations. To some extent, causative bundles seemed compatible with the objective of RAIs as they were helpful in unifying ideas (Budiwiyanto & Suhardijanto, 2020). On the contrary, bundles functioning to infer ideas tended to be more ordinary in the result or conclusion section (Gil & Caro, 2019). The second subfunction, the structuring signal, was identified as 7.16% higher in ARAIs. In this study, this type was used to structure the parts of the study. Some examples of structuring signal bundles in ARAIs were 'the main foundation in this study' and 'the objectives/focuses/results of the study.' On the other hand, the only structuring bundle detected in RRAI's corpus was 'of this study.' The next significant proportion difference was in objective bundles, whose function was to introduce the authors' or research's aims. The forms in both corpora were relatively the same (e.g., 'to examine the', 'this study aims'). However, the raw frequency was higher in RRAIs (20 occurrences) than in ARAIs (12 occurrences). Questioning bundles that served to establish research questions were similarly more evident in RRAIs since the question forms in ARAIs were dissimilar from one paper to another, as mentioned in subsection 3.1.1. In spite of these, ARAIs displayed more exemplifier bundles (e.g., 'such as the'), while such a bundle could not be found in RRAIs.
Despite several differences, there was a similar prevalence of framing signals as the most common text-oriented subfunction, with a nearly identical percentage in both corpora. The prevalence of framing bundles reinforces the results of previous studies (e.g., Hyland, 2008a;Jalilifar & Ghoreishi, 2018;Savelyeva, 2021). Framing was helpful in specifying a context (19) or a limitation (20) so that the readers knew the conditions under which the information could be accepted. Hence, this type could create a more effective and straightforward RAI. Other text-oriented subfunctions, i.e., additive, comparative, citation, and generalization bundles, were also similar in both groups. Additive and comparative bundles as transitional markers were generally performed by conjunctive adverbs such as 'in addition to' and 'on the contrary.' Meanwhile, citation bundles (e.g., '(is) in line with', 'according to the', or 'stated that the') whose function was to cite sources emerged 35 times in ARAIs, but only 13 times in RRAIs. A citation itself was considered essential to be included in RAIs since authors had to convey justification of their study's worthiness by providing supporting information. Additionally, to show the research novelty, authors needed to review and cite relevant previous studies to identify the gap that needed to be addressed (Belcher, 2019;Swales, 1986).
Generalization bundles, contrastingly, had a higher frequency in RRAIs. This subfunction was employed to signal the agreed-upon information in the related literature and was helpful in communicating abstract knowledge (Tessler & Goodman, 2019). In RRAIs, the bundle 'refers to the' carried this function. Interestingly, despite having lower frequencies, generalization bundles in ARAIs showed more distinct forms (e.g., 'is defined as', 'is known as').

Participant-oriented bundles
Both participant-oriented subfunctions were discovered to have significant proportion differences. The first subfunction, the stance feature, was primarily used as a hedging device to prevent authors from being fully responsible for their claims and to soften assertions for the reader. In RRAIs, 100% of this type comprised modality 'can' to indicate ability (21) or possibility (22). 'Can' was also a popular modality for the stance feature in ARAIs. However, one distinct bundle, 'need to be', was found in ARAIs carrying more sense of necessity (23).
(21) Instagram can be used to help students learning foreign language and be the way for teachers to boost up students' language learning autonomy ... [R6] (22) In addition, the contents of a suicide note can be a complaint or motive on why the victim committed suicide. [R3] (23) Seeing all the trends, the ways of doing teaching need to be suited to this changing nature of learning. [A11] The engagement feature, which is the least common participant-oriented subfunction, was not present in the RRAIs corpus. In the ARAIs corpus, the only engagement bundle identified was 'can be seen', which directed readers to refer to other parts of the text to understand the context being discussed (24). This dearth of engagement bundles may suggest a lack of interactivity and direct involvement with the readers in rejected research articles. Meanwhile, the limited presence of such bundles in accepted articles indicated that authors may have used them strategically to guide the reader's attention and understanding.
(24) At last, it can be seen that the highest number types of theme is topical theme. [A14] Overall, the finding of limited use of participant-oriented bundles is consistent with the results of previous research in this area (e.g., Bal-Gezegin, 2019; Hyland, 2008a;Jalali et al., 2015). This was partly due to the 'weight' of academic writing, which often is impersonal (Mauranen & Bondi, 2003), particularly in the writing of RAI, where the focus was supposed to be on the presentation of supporting background to endorse the research value.

CONCLUSION
The study findings have revealed that LBs are manifested differently within the introduction sections of applied linguistic RAs that were either accepted or rejected by a Scopus-indexed journal. Analyzing the structure and function of LBs has revealed significant similarities and differences in the two datasets utilized for this study.
Across almost all categories, the results suggest that the authors of ARAIs were more familiar with the common expressions used in introduction sections, as evidenced by the higher frequency and variation of LBs. Upon closer examination of LBs structures, it was discovered that noun phrases-based bundles were commonly utilized. This was due to the prevalence of nominalization, which allowed for the incorporation of background knowledge into RAIs in a dense manner. However, a notable difference was displayed in the ratio of bundles with passive and non-passive verbs. ARAIs manifested more passive verbs to the presence of the doer and guided the readers to focus on the result of a certain action. Meanwhile, bundles containing non-passive verbs were more common in RRAIs, in which any entity was often put as the subject of the mentioned propositions. Regarding the functions, research-oriented bundles accounted for the most significant portion in both groups. This suggests that the authors were aware of the need to provide a more detailed explanation of their research context in RAIs. The most noticeable difference lay in the manifestation of description and procedure subfunctions. ARAIs employed more description bundles to illuminate the attributes of the research topic, while RRAIs manifested more procedure bundles to define related events or actions. On the other hand, framing bundles and stance features became the most prevalent text-and participant-oriented subfunctions. Both data groups adopted framing bundles to specify the context and stance features to hedge authors' assertiveness in statements.
Despite using small corpora, this study aims to raise awareness and provide helpful guidance for the effective construction of RAI, particularly to increase the chance of RA acceptance in the targeted journals by using most common LBs as discourse building blocks. It is important to acknowledge that the forms, structures, and functions of LBs are not universally standardized and may vary across different discourse communities and conventions. As such, the patterns of LB use observed in RRAIs in this study may not necessarily apply to other contexts or fields of study. To gain a more comprehensive understanding of LB usage in RAIs, future studies could expand their corpus data to include a wider range of reputable journal publishers and across different disciplines. Moreover, employing more advanced analytical procedures and tools can help improve the accuracy and reliability of the results obtained.