Development of Wordwall-Based Assessment Instruments to Measure Higher Order Thinking Skills Science Material Temperature, Heat, and Expansion

. Development of wordwall-based assessment instruments to measure


Introduction
One of the latest education policies issued by Minister is to replace the national examination (UN) with the national assessment (AN).The national assessment system focuses on the process of evaluating and mapping the education system.The evaluation process applied to students requires higher-order thinking and not just understanding and remembering learning material (Putra et al., 2021).Students must be able to think at a higher level in the hope that it will encourage students to apply new and previous knowledge in solving problems in new situations (Dermawan et al., 2021).This scientific ability is defined as higher order thinking skills (HOTS).
HOTS can be interpreted as thinking activities that list the high hierarchical cognitive level of Bloom's thinking taxonomy, including analyzing, evaluating, and creating levels by Sudibyo et al. (2020).The application of HOTS in evaluation aims to improve the quality of students' thinking, especially about the process of solving a problem.HOTS aims to improve cognitive thinking skills at a higher level, especially those related to the skill of analyzing learning materials that are very challenging to understand.The output of implementing HOTS is to further develop higher ways of thinking actively, critically, and creatively at a higher level to capture various types of information, solve problems and make decisions (Fadhil & Rokhimawan, 2020).
In fact, in Indonesia, the ability to HOTS in students is still very low.The low ability of HOTS is confirmed by the results of the PISA report in 2018 which describes Indonesia's ranking in Science classified as low, which is ranked 71 out of 79 study participating countries (Hewi & Shaleh, 2020).Based on the PISA results, it can be stated that Indonesia's HOTS level is still low compared to other countries.The poor efforts of the Indonesian state in optimizing the HOTS process in learning and learning evaluation have caused the ranking in PISA to be very low.One aspect that triggers the low level of HOTS ability in students is the lack of application of HOTS-based evaluation questions and the teaching materials and media used do not trigger students to think in HOTS (Wahyuni et al., 2021).Based on interviews obtained at SMPN 2 Maesan, Bondowoso showed that the lack of application of HOTS-based questions and the majority of teachers are still not used to applying HOTS in the evaluation process carried out.This results in students not being familiar with the existence of HOTS-based questions which will later have an impact on the low ability of HOTS.
HOTS-based evaluation can motivate students to master higher-order thinking skills to apply their knowledge in finding solutions to new problems.Evaluation instruments containing HOTS questions can train students' ability to solve problems.Widarta & Artika (2021) stated that the characteristics of HOTS problems include being able to transfer one concept to another, wanting the ability to process and use information, forming the ability to draw relationships from various information, forming the ability to process information to overcome problems, and forming the ability to analyze ideas and information critically.
So an evaluation process is needed whose implementation is interesting and innovative in order to motivate student learning, to be able to meet the desired learning objectives.Kurniati et al. (2021) stated that one way that educators can always be creative and innovative in the evaluation process is to keep up with the times and utilize information and communication technology (ICT).Educators can utilize ICT to be implemented in learning evaluation, such as utilizing various available and easily accessible applications such as Wordwall.Wordwall is a learning media and evaluation that can be implemented in the learning process.In Wordwall there are also interesting and fun quiz-based games.The choice of wordwall as an assessment medium that is quiz-based games is a learning strategy solution that can be used in optimizing student learning outcomes.Thus, it is expected that using wordwall as an assessment instrument can optimize student learning outcomes, especially in the ability of HOTS.Based on the description of the problem above, the purpose of this study is to produce a word wall-based assessment instrument that is valid, reliable, has good distinguishing power and has difficulty.The results of this research are expected to provide ideas and information for developing quality HOTS questions.

Methods
This type of research is development research that aims to create a specific product by examining various testing processes to ensure its usefulness (Destiana et al., 2020).The research design used is the Borg and Gall development model which has systematic steps and is easy to understand.According to Hidayat et al., (2021) the Borg and Gall development model includes 10 general steps.These steps do not have to be followed because they can adjust to the needs of researchers, so this study only uses 8 stages.The stages of this research include (1) looking for potential and formulating problems, (2) collecting various information, (3) designing products, (4) validating related to product design, (5) revising designs, (6) conducting trials for small groups, (7) revising products, ( 8 This research was conducted at SMP 2 Maesan Bondowoso in the 2022/2023 academic year in the even semester.The subjects of the study were all grade VII students with details of 25 students for small group trials and 100 students for field trials.Data collection techniques in this study are validation sheets, interviews, observations, tests, and documentation.Data analysis techniques in this study use expert validation analysis, empirical validity (analysis of question items, discriminating power, and level of difficulty), and reliability analysis of word wall-based assessment instruments.

Results and Discussion
The research was conducted at SMPN 2 Maesan with participants from all grade VII students.The results of the questions from the development of the HOTS assessment instrument are implemented in education-based media, namely word wall.The developed wordwall-based assessment can be accessed through several links, one of which is https://wordwall.net/resource/51904356.The link will later be sent to the grub of each class and the test work can be done online with good internet access.Making tests on the wordwall is divided into 4 sessions according to the number of classes in SMP 2 Maesan, namely classes A, B, C, and D. The development of wordwall-based assessment instruments using various templates can be seen in Table 1.After the assessment instrument was developed, it was validated by 3 expert validators from science teachers at SMPN 2 Maesan.Validity relates to the ability of an assessment tool to measure the topic being studied by students, ensuring that it accurately measures what it should be.In accordance with Suseno (2021) stated that assessment instruments function to determine the achievement of learning given to students, so it is necessary to ensure that tools can measure accurately and validly.The results of expert validation of assessment instruments can be seen in Table 2. Based on the results of expert validation of wordwall-based assessment instruments by validators, it can be stated that the instruments made are valid and suitable for use in the process of evaluating science learning of temperature, heat and expansion materials.The assessment of expert validators to determine the feasibility of the instrument is reviewed and assessed from the aspects of material, construction and language which has a total of 15 indicators (Alfiana et al., 2021).There are several suggestions and inputs from validators that must be added by researchers aimed at improving the quality of the assessment instruments developed.The results of suggestions and feedback are described in Table 3.The unit of answer choice is the same, namely kcal Change the unit of answer choice by changing KJ or Joule In addition to being tested with validation based on experts, assessment instruments must also be tested in small group trials to determine the validity of the question items.Good quality question items will be a strong basis for interpreting the test using a high degree of validity.Valid instruments will also obtain valid data (Mamik, 2015), so that questions with valid information will produce precise data, while questions with invalid information will be revised until valid information is produced.Small group trials are conducted on 25 students and the test results will be analyzed with various tests, such as question item validity, reliability, discriminating power, and level of difficulty.The analysis was performed using microsoft excel and SPSS application.Analysis of the validity test of the question items can be seen in Table 4.The calculated results will be compared with the r table at t.sig 5% by adjusting the number of participants who take the test.If the calculated results are > r table, then the question items made are valid (Waminton, 2015).Question items that are declared invalid will be revised so that they can be reused in field trials.Furthermore, reliability tests were carried out using the Cronbach alpha formula which resulted in a reliability interval of 0.741.Based on the reliability criteria, the interval of small group test reliability results is included in the high criteria in the range of 0.70-0.90(Waminton, 2015).Question items that are considered valid and reliable still have to go through further tests with an analysis test of the differentiating power and difficulty of the question items (Azizah et al., 2020).The results of the discriminating power test and the difficult test of the question items are recapitulated in Table 5.Based on the results of the distinguishing power test of question items conducted using SPSS, there are still question items that are categorized as bad, so it needs to be revised so that the question items developed can sort out students who understand the material well from those who do not understand the material.A question item is recognized as having poor distinguishing power, when the question can be responded has a proportion of questions that are evenly divided between easy and difficult questions (Indriaty & Setyoko. 2018).The results of field trials have a composition that is in accordance with theory, which has 17 easy questions in which there are easy and medium questions and 3 difficult questions.

Conclusion
Based on the results of the analysis of the research that has been conducted, the assessment instrument has been considered valid and suitable for use after being tested through tests of validity, reliability, discriminating power, and level of difficulty.The validity obtained is 80.99%, the reliability of the instrument is 0.770, and the questions already have good distinguishing power, so that the level of difficulty of the entire question is evenly divided, so it can be concluded that the assessment instruments developed can be used in the evaluation of science learning to measure higher order thinking skills.

Figure 1 .
Figure 1.Borg and Gall development research pipeline

Table 2 .
Expert validation results

Table 3 .
Results of suggestions and feedback

Table 4 .
Test results of validity of small group test items

Table 5 .
Results of recapitulation of discriminating power and difficulty level of question items in small group trials