Validation of Sience, Technology, Engineering, and Matemathic Based Diagnostik Assessment on Natural Resource Material for Phase B Elementary School Students using Rasch Model

. One of the ways to design learning the implementation of the independent curriculum is assessment diagnostic. This study aims to develop STEM-based questions to measure students' critical thinking skills. The questions are used for the initial assessment of science subject material "Natural Resources" phase B elementary school. The type of research used is research and development (R&D) with the ADDIE development model, consisting of the analysis, design, developmen, implementation, and evaluation stages. The school students in the city of Semarang with a total of 258 students consisting of 126 male students and 132 female students, using convenience sampling. The items' validation and reliability analysis techniques used the Aiken index and the RASCH model. The results showed that all items had a high validity category based on the following aspects: Outfit items, measure, test difficulty level, person and item reliability. There are several phase bias items, where students in phase C benefit more than students in phase B. The average respon ability is above the averange item difficulty level. The conclusion is that the items developed can be used as initial assessments in science subjects.


Introduction
Learning the new paradigm ensures that the practice of learning is learner-centered (Azis et al., 2022) The new learning paradigm provides educators the flexibility to formulate learning designs and assessments according to the characteristics and needs of students (Phil, 2021). In learning the new paradigm, the learning and assessment cycles are interrelated so that students can achieve the expected competencies. Using appropriate assessments for teachers will determine the weaknesses in learning so that teachers can improve methods in the learning process (Rosnaeni, 2021).
Assessment can be interpreted as comparing or applying measurement results to provide value to the object (Zainal, 2020). Students can carry three aspects of the realm of assessment: knowledge, attitudes, and skills (Jeanne & Prendergast, 2020) Applying diagnostic assessments before designing learning is a hallmark of an independent curriculum. The main direction of learning in the independent curriculum is that students can take advantage of and hone their competencies in their social life (Nurmasyitah et al., 2023). The independent learning curriculum is specifically designed to give the right to learn independently (Inayati, 2022). The independent curriculum is expected to have diverse extracurriculars (Purnawanto, 2022). An independent curriculum makes learning more optimal, giving students enough time to understand concepts and strengthen their competencies.
Based on observations from several elementary schools in the city of Semarang and also as a first step in implementing an independent curriculum, researchers develop an assessment of questions. The questions developed are based STEM. The experience students gain in STEM learning is that students can gain knowledge in science, technology, engineering, and matematics. That knowledge can then be used to solve real-life problems and use them meaningfully in life (Jang et al., 2022). The benefits of implementing STEM education are that it can improve critical and creative thinking skills. In addition, students can be more logically creative, innovative, productive, and directly related to real conditions (Wang & Wang, 2023).
Science is commonly referred to as natural science. Science studies how this universe works, so science can be called a science that studies nature itself. In the 21st century, four skills must be met, namely 1) digital era literacy, 2) inventive thinking, 3) effective communication, and 4) high productivity (Patonah et al., 2020). Scientific literacy is part of literasy which include knowledge and understanding of scientific concepts and processes (Cai-Ting et al., 2020). Science skills are a must-have in the 21st century and can be realized in everyday life. The challenge in learning in the 21st century is integrating pedagogy, technology, and assessment and involving teachers, students, and leaders as stakeholders (Sustiningsih et al., 2021).
Previous research has proven that developing STEM-based test assessment instruments is feasible for measuring students' creative thinking ability (Agustina, 2022). STEM is a modern approach that integrates all of its aspects to solve problems that arise 21st-century life (Hebebci & Usta, 2022). STEM learning aims to solve problems in everyday life by applying it in schools whose learning subjects combine the knowledge, skills, and attitudes students possess (Subayani, 2022).
Applying the right STEM approach model can affect students' success in understanding the material. It can motivate students to find learning concepts (Izzati et al., 2019). Thus, developing this instrument is necessary as an initial assessment of implementing an independent curriculum. This study aims to develop STEM-based questions on natural resource materials for phase B elementary school students as the first step in implementing an independent curriculum.

Methods
This form of research is research and development (R&D). Research and development is a research method used to develop or validate products for teaching and learning (Nugroho & Airlan, 2020). This research was used to develop an assessment test using questions to measure the critical thinking skills of phase B elementary school students on natural resource materials. This research refers to the model developed by Dick and Carry (1996), analysis, design, developmen, implementation, and evaluation (ADDIE) models. The subjects involved in this study were grade III and IV students at SD N in the Semarang city area, with a total of 258 students consisting of 126 male and 132 female students. ADDIE's research and development procedures in detail are shown in flow chart.
The first stage is the analysis stage. Before a diagnostic assessment is developed, a needs analysis and material analysis are first carried out. The analysis was carried out on primary school teachers and elementary school students. At this stage, its activities include collecting information such as teacher needs analysis, student needs analysis, cognitive analysis, and analysis of critical thinking skills. Some of the main things obtained at this stage are 1) implementing STEM-based diagnostic assessments has not been applied to phase B natural resource materials, 2) most questions measure aspects of memory that cannot be used to measure students' critical thinking ability.
The second stage is the design or role, carried out after analyzing the existing problem. At this stage, researchers begin to design products to be developed to measure product quality. The activities carried out include 1) determining learning outcomes (LO) based on phases determined based on independent curriculum documents, 2) determining material on learning that has been determined in LO, 3) recognizing competencies and content contained in LO, 4) planning the use of active verbs that are comparable to HOTS questions, starting from C4 to C6 at a predetermined taxonomic level, 5) determine the indicators of the questions, 6) make the question items.
The third stage is the development of the problem. The development product in the form of question items is then validated by experts and practitioners before being tested on students. The product validation results of the question items are revised by expert advice and input. The product of the question item can be implemented if the product has been declared. Validation is carried out through 2 things, namely expert and empirical validation. Expert validation is carried out to review the accuracy of question items that can be seen in terms of material (substance), construction, and language. Seven validator experts, including lecturers and principal practitioners, carry out expert validation. The scores given by the expert use the criteria shown in Table 1. Analysis data from expert validation was calculated using the Aiken index obtained from seven predetermined validators. The Aiken validity index can be calculated using the formula: Remark: V= rater agreement index regarding validation of question items r= rater assessment s= difference of rater reduced by the minimum value of the rater n= number of appraisers c= maximum value of validity Seven experts further validate the questions compiled with the criteria indicated in Table  2. Furthermore, the question item is empirically validated. Empirical validation is carried out to test the accuracy of question items and the reliability of instruments based on trials of samples determined in the study (Setemen, 2018). The question item validation test was carried out using the RASCH model with the help of the Winstep 3.73 application. The criteria used to check the suitability of the question item so that it can be said to be appropriate or inappropriate are by looking at the values of outfit mean square (MNSQ), Outfit Z-standard (ZSTD) and point measure correlation (Pt Mean Core). The criteria for items said to be valid can be seen in Table 3 (Arnold et al., 2018;Ramadhani & Fitri, 2020;Sumintono & Widhiarso, 2015). The fourth stage is the implementation stage: implementing a diagnostic assessment to be tested on elementary school students. The student in Phase B in the independent curriculum (source) in 2 elementary schools in Semarang City using convenient sampling techniques. Item questions were tested in 2 elementary schools in Semarang city in grades 3 and 4. A STEM-based assessment test on natural resource materials was tested on 258 students with different abilities. Trials were conducted to determine the extent to which the level of question items can achieve the desired goal. The quality of the question items developed will be tested empirically through trials.
The last stage is the evaluation stage. In this stage, the question items that have been tested are then analyzed using the Winstep 3.73 application. Each question item is analyzed based on reliability, validity, difficulty level, and bias on certain variables.

Results and Discussion
The assessment test instrument developed is an elaboration of each predetermined learning outcome. The elaboration of learning outcomes produces learning objectives to be used as a reference for making questions. Problem indicators are based on content (material) aspects, competency aspects, and taxonomic levels. Each indicator created is then divided into two questions with different taxonomic levels. The taxonomic level for STEM is C4 to C6 because it is indicated by a question with a high level of intelligence (Damanik & Irfandi, 2021;Khaldun et al., 2020;Kristanto & Setiawan, 2020;Ndiung & Jediut, 2020;Widarta & Artika, 2021;Yuliandini et al., 2019). Any question containing STEM can measure students' critical thinking ability.
Based on predetermined learning outcomes produces three learning objectives. The learning objectives were developed into 7 question indicators. Each question indicator compiled two questions according to the level of taxonomy bloom Table 4. The results of the analysis of question items by expert experts calculated using the Aiken Index are shown in Table 5. Based on the Aiken Index table validated by expert validators, column A is the result of validation based on the material (substance), column B is the result of validation based on construction, and column C is the result of validation based on language. The study results showed that out of 14 STEM-based questions on natural resource materials for phase B students, they showed good validation based on the validity criteria, the value stated in the "valid" category (>0.4) (Azhar et al., 2020). The values obtained are 0.77-0.86 regarding material, construction, and language. The validity test of items used to determine the cause of the question have low criteria (Wulandari et al., 2022).
After being validated, there are several suggestions for improvement from expert experts, namely the formulation of the subject matter that is not clear and clear, spelling improvements, and the use of appropriate EYD in the question item. Furthermore, improvements were made by formulating the subject matter clearly and clearly and using Indonesian properly and correctly according to the appropriate spelling and EYD.
In the next stage, 14 questions were developed and tested on elementary school students with 258 respondents, 126 men, and 132 women. It aims to determine the suitability of the model (question validity, difficulty level of the question, and reliability of the question). The questions developed were tested in 2 elementary schools in Semarang. Table 6 shows a summary of the data analysis results with the Winstep application's help. In the RASCH model, the difference in power value can be seen from the error standard. Niliai model SE > 0.5 identifies that the different power of the item is good. In the SE model, 0.5-1, the difference power is categorized as quite capable of distinguishing. In the SE model < 1, the difference power is inadequate or unable to distinguish (Purniasari et al., 2021). Based on the results of the research conducted, it shows that the SE items identify that the differential power of the items is good. Meanwhile, the SE person identified that the discriminating power generated was quite capable of discriminating. Table 6 shows that the criteria of person reliability are sufficient, item reliability is good, and Cronbach's alpha value is good. The criteria are obtained from the view (Sumintono & Widhiarso, 2015) that in determining item reliability and person reliability based on the following values: <0.67 (weak); 0.67-0.80 (sufficient); 0.81-0.90 (good); 0.91-0.94 (very good); >0.94 (unique). This value shows that the quality of the question items shows excellent results with good reliability, while the quality of human resources is in the category of sufficient. In addition, the value of alpha Cronbach indicates that the interaction between a person and an item as a whole is a good value. Reliability test was carried out using the cronbach alpha formula with criteria of 0.6 (Tursinawati & Widodo, 2019). On the value of reliability ar internal coefficient > 0.7 indicates the questionnaire used is reliable (Puspita et al., 2021).  Table 7 summarizes the RASCH analysis of the Winstep 3.73 application-assisted model, data obtained from 258 respondents and 14 question items. Based on the measured value obtained, seven questions with the difficult item category, and seven with the easy item category. The difficulty level of the question items is reviewed according to (Sumintono & Widhiarso, 2015: 70), which is grouped into four categories, namely < -1 (specific items); -1 to 0 (easy items); 0 to 1 (difficult items); >1 (complicated items).
The analysis results of the problem instruments developed to show that they are valid. This is confused with the MNSQ infit value in the data analysis results, which shows 1.12 ± 0.89. the match conditions can be seen if the MNSQ INFIT value is 0.77-1.33 (Suparman, 2020). Thus it can be concluded that the STEM-based question instrument developed has been valid. The outfit mean squared (outfit MNSQ) value of 0.86 to 1.17 indicates the fit criteria. The match provision is between 0.5<MNSQ<1.5. The test instruments are appropriate to measure students' ability to think critically about natural resource materials.
The distribution of the question item is considered a misfit or unfit if it meets one or both of the following conditions. The first condition is that the MNSQ outfit value value is between 0.5 and 1.5; ZSTD Outfit values are between -2.0 and 2.0; and the item correlation value with a total score point measure correlation located between 0.4 to 0.85 (Sumintono & Widhiarso, 2014). Testing the validity of question items using Pt. measure corr values, with valid criteria if they meet the range of 0.4<Pt. Mean Corr<0.85. So it means there is a polarity item because the value obtained does not meet the required criteria range (Ramadhani & Fitri, 2020). In question item number 8 with the value of Pt. Mean Corr is 0.38. It is stated as an invalid question. While the MNSQ and ZSTD outfit values in Table 7. indicate values of 0.86-1.17 and -1.7-1.6 which means that the item of the developed question is valid or accepted.
Items and measurement instruments that are biased if an item is more partial to one individual with specific characteristics and on the other side of an individual with an opposition character are harmed. The bias of a test can be interpreted as invalidity or sistematic error in measuring members of a group under study (Darmana et al., 2021). A question item is considered biased if the probability value is below 5% (0.05) (Burhani et al., 2022). Based on Figure 1 and Figure 2 shows the level of distribution in answering questions based on gender and phase.  Figure 1, it can be seen the distribution of the difficulty level of the question by gender. The test takers consisted of two different groups, namely the male and female groups. Character identification needs to be done because it can affect the achievement of learning outcomes. The differential item functioning (DIF) value is used to find the instrument's bias. Figure 1 shows that male and female students have no bias and have an average ability not to answer questions well in doing the tested questions.  Figure 2, it can be seen that the questions that experience bias (DIF) are questions number 1 and 9. The problem is infected with DIF because the probability value is below 0.05. it can be seen in the graph that the question item is easier for phase C students to do than for phase B students. Meanwhile, a curve close to the upper limit shows a question item with a great difficulty level (question number 9) (Aprilia et al., 2020).

Conclusion
The research produced 14 stem-based questions on natural resource materials for phase B elementary school students with good quality. This development is used as an initial assessment in implementing an independent curriculum to measure students' critical thinking skills. Question items is carried out with two kinds of validation: 1) validation was carried out by seven validator experts with the acquisition of expert material, construction, and language using the Aiken index, 2) empirical validation performed by RASCH modeling using the help of Winstep 3.73 applications. Produce valid question items so that they can be used to measure students' critical thinking skills in phase B for science subjects. Further research can be done to determine the effectiveness of the questions in determining the map of students' abilities so that it can be used as a reference to carry out differentiated learning in elementary schools.