The patient engagement evaluation tool was valid for clinical practice guideline development

Objective: To evaluate reliability and validity of the six and 12 item Patient Engagement Evaluation Tool (PEET) to inform guideline developers about the quality of patient and public involvement activities. Study Design and Setting: PEET-12 and three embedded validation questions were completed by patients and members of the public who participated in developing 10 guidelines between 2018 and 2020. Conﬁrmatory factor analysis (CFA) was used to assess the validity of a single-dimension factor structure. Cronbach’s alpha and Pearson correlations were calculated for internal consistency reliability. Concurrent validation was used to test the construct validity. Results: A total of 290 participants completed the PEET-12. To improve tool efﬁciency, based on results indicating redundancy from initial item analysis and experts’ review, six of 12 items were included in the ﬁnal tool (PEET-6). For the PEET-6, CFA supported the single-factor structure ( χ 2 (15) = 5173.4, P < 0.001, Tucker-Lewis Index = 1.00, Comparative Fit Index = 0.99, Root Mean Square Error of Approximation = 0.08). Correlation between the total score for the 3 validation questions and the PEET-6 total score was 0.71, 95% CI [0.65, 0.77], supporting construct validity. Conclusion: PEET-6 and 12 are valid tools to measure patient and public involvement within settings of clinical practice guideline development.


Introduction
Meaningful patient and public involvement (PPI) in guideline development is an ethical imperative for developing trustworthy guidance. It is stipulated by the Guidelines International Network [1] and the Institute of Medicine-US (Now the National Academy of Medicine) [2] and emphasized in guideline quality appraisal standards (e.g., The Appraisal of Guidelines for Research & Evaluation Instrument) [3] . Guidelines developed with patient involvement are more likely to address patient preferences, provide recommendations that are better tailored to individual needs, and better support clinical decision making, particularly when practitioners perceive incongruency between patient preference and the guideline recommendations [ 4 , 5 ].
Guideline developers worldwide, including the Canadian Task Force on Preventive Health Care (CTFPHC) [6] , United States Preventive Services Task Force [7] , Scottish Intercollegiate Guidelines Network [8] , and National Institute for Health and Care Excellence [9] , undertake strategies to involve patients and the public in guideline development. Some criticized such strategies as tokenistic in some cases and potentially contributing to inequity in guideline recommendations [ 10 , 11 ], emphasizing the need for guideline developers to evaluate the quality of their engagement activities [12] .
The Patient Engagement Evaluation Tool (PEET) was developed as a theory-informed measure of the extent to which criteria for successful engagement are met across domains (trust, respect, fairness, competency, legitimacy, accountability) from a participant's perspective [13] . PEET was applied to evaluate knowledge user engagement during the development of a systematic review of geriatrician-led models of care [12] and during guideline development by the CTFPHC [7] , which produces clinical practice guidelines on primary preventive health care. The objectives of this project were to evaluate the reliability and validity of the PEET and to determine if it could be shortened without substantively changing measurement characteristics.

Methods
This cross-sectional study evaluated factor structure, reliability, and validity of the 12 item PEET, the selection of items for a shortened six item version, and similar testing with the six item version. Data were collected from members of the public who provided input into the development of 10 CTFPHC guidelines and completed the 12 PEET items between 2018 and 2020.

Participants and engagement activities
Between 10 and 26 individuals were recruited per guideline with attempts to include people from each Canadian province and territory. Participants were recruited through advertisements on public websites (e.g., Kijiji, Craigslist), the CTFPHC website, the website of the Knowledge Translation Program (KTP) of St. Michael's Hospital (Toronto, Ontario, Canada), and from a KTP database of individuals who had expressed interest in providing feedback on CTFPHC guidelines and tools [ 7 , 13 ]. People expressing interest completed an online eligibility survey containing demographic, health, health equity, and conflict of interest questions.
Participants representing the guideline target population were engaged at two stages of guideline development. The 12-item PEET was completed after each stage. In stage 1, participants used the Grading of Recommendations Assessment, Development and Evaluation outcome rating approach [14] to rate the extent to which a series of predefined screening outcomes (benefits and harms) were either; not important (rating 1-3), important (rating 4-6), or critical (rating 7-9) for making decisions relevant to the guideline topic. For example, reduced risk of infection transmission due to screening for chlamydia and gonorrhea was an outcome rated by participants. They were also asked to list other outcomes they deemed important. This was followed by an online moderated focus group where participants discussed their outcome ratings.
For stage 2 of each guideline input process, a different group of participants were provided with the evidence summary from the systematic review to evaluate their preferences when considering undergoing a screening intervention (or not) for a specific health condition (such as colon cancer). Participants used a 9-point scale via an online survey to rate the extent to which each outcome would influence their decision to be screened for the health condition (1 = This isn't important for my decision at all to 9 = This is very important for my decision ) [15] . Consent was obtained and participants were engaged in a 60-minute moderated, recorded focus group via teleconference, which included a CTFPHC content expert to answer any questions, to discuss the survey outcomes and general screening preferences. One week after the focus group, these participants completed an online survey to assess their engagement (PEET) and experience with this project stage. Details about data collection are available in previous publications [ 6 , 13 ].

The PEET
The PEET tool was designed to quantify the level of participant engagement in clinical practice guideline devel-opment using theory-informed meta-criteria, or domains, from a stakeholder perspective [13] . The meta-framework was based on democratic participation principles [16] . The tool gauges participants' opinions regarding the extent to which each attribute was present during their engagement activity across six domains: trust, respect, accountability, legitimacy, competency, and fairness [13] .
The original 12 item PEET tool included two items for each of the six domains except for fairness, which has three items, and trust, which has one item. Items were rated on a seven point adjectival Likert scale (ranging from 1 = no extent to 7 = very large extent). Survey items (see Appendix A) were tailored to the engagement activities employed. For example, "To what extent do you believe that your ideas were heard during the engagement process?" Respondents were asked to explain their choices if they rated any item one to four (text entry). The score for the ( continued on next page ) scale was the total of all items, with higher scores reflecting greater engagement.

Validation Questions
Three validation items embedded in the survey (see Appendix A) evaluated concurrent validation [17] by assessing convergence between the overall construct, degree of engagement, and the extent to which participants believed that: (1) Their values were reflected in the final conclusions of the patient and public engagement activity; Their degree of 'buy in' with the engagement process as measured by their intent to (2) Follow the health recommendations for which they participated in developing, and (3) Advise others to follow those health recommendations. For consistency, validation items were also rated on a seven point adjectival scale (ranging from 1 = no extent to 7 = very large extent). We hypothesized that high levels of overall meaningful engagement (total PEET scores) would be associated with high scores for these validation items.

Measure Shortening
Two investigators with experience in guideline development and patient engagement (AM, RG) initially selected one item of each of the six PEET domains from the 12 item version (to retain one from item each domain) for inclusion in the shortened six item version. Items selected were deemed to have better face validity and were discussed and agreed upon via a consensus process with other research team members. We created and tested a shortened version in response to patient suggestions to consider response burden.

Statistical Analysis
All analyses (descriptive statistics, reliability assessment, factor analysis, and assessment of concurrent valid-ity), were carried out for both the 12 item and shortened six item PEET after item reduction.
Means and standard deviations (SDs) summarized continuous demographic variables, and percentages were used for categorical variables. For each PEET item, means, standard deviations, frequency of endorsement of each response option, and corrected item-total correlations were calculated. Means and SDs were also calculated for total scale scores. Floor and ceiling effects were examined, defined as ≥ 15% of the participants having the lowest or highest possible score, respectively [ 18 , 19 ].
Inter-item correlations were calculated, and Cronbach's alpha was used to assess the internal consistency of the PEET. We planned a priori to consider item reduction to improve tool efficiency and decrease participant burden [ 18 , 19 ] if the internal consistency of the 12-item version was greater than 0.95, signaling item redundancy [17] .
Construct validity was assessed using Confirmatory Factor Analysis (CFA) and concurrent validation methods. CFA was selected to confirm the validity of a unidimensional structure of item responses, as identified a priori by the developers. Unidimensionality was proposed because PEET domains and items were closely related, and they all measured an overall engagement construct. CFA used the weighted least squares estimator with a diagonal weight matrix, robust standard errors, and a mean-and-varianceadjusted chi-square statistic with delta parameterization in MPlus seven [20] . Model adequacy was assessed using a chi-square goodness-of-fit test and three fit indices, including the Tucker-Lewis Index (TLI) [21] , the Comparative Fit Index (CFI) [22] , and the Root Mean Square Error of Approximation (RMSEA) [23] . Since the chi-square test is susceptible to sample size and can lead to the rejection of well-fitting models, practical fit indices (TLI, CFI, RMSEA) were emphasized [24] . Models with a TLI and CFI close to 0.95 or higher and RMSEA close to 0.06 or lower are representative of good fitting models [25] . Since RMSEA is calculated partially based on chi-square,   Table 2 . b On a 4-point scale, 1, 2, 3, and 4 were combined into one category.
a RMSEA of 0.08 or more [26] may also be considered to represent a reasonably acceptable model fit. Item response categories were combined in cases where the spread of the distribution of responses was too sparse, including having no responses, across one or more categories for CFA modelling [ 20 , 27 ]. Previous studies have found that collapsing categories with few responses in CFA leads to scales with roughly equivalent psychometric properties, including factor structure [27] . Pearson's correlations (r) with 95% confidence intervals were used to assess the strength of the relationships between participant ratings of embedded validation questions and total PEET scores [17] . Generally, correlations greater than 0.40 suggest construct validation of the instrument, in this case reflecting participant buy-in and potential uptake of the recommendation [17] .
The 95% CIs for the difference between correlations ( r) with each of the three validation items were calculated [28] to compare differences in r between the 12 item versus the shortened 6-item tool.

Participant characteristics
A total of 304 members of the public provided input on CTFPHC guideline development during the 2-year study period, of whom 299 (98%) completed the PEET. Of these, nine participants who submitted responses for the PEET did not answer all items. Therefore, 290 participants (95%) with complete PEET responses were included. Most of the respondents were women (72%), attained college diplomas or bachelor's degrees (58%), were living in urban areas (61%) and self-identified as white (67%). The mean age was 45 (SD = 18) years ( Table 1 ).
Correlations between item scores ranged from r = 0.40 ( P < 0.01, Items 4 and 8) to r = 0.80 ( P < 0.01, Items 11 and 12) ( Appendix B1 ). In addition, the correlations between  ( Table 3 ). No participants had the lowest possible total score (12.0) on the scale, and seven (2.4%) had the highest possible score (84.0), suggesting that there were no floor or ceiling effects.

Item reduction and 6 item Patient Engagement Evaluation Tool
Considering the high degree of inter-item correlations and internal consistency (alpha = 0.95) found for the 12 item tool, items 2, 6, 8, 10, 11 and 12 were removed, and a six item version (one item for each domain) of the tool ( Fig. 1 ) was selected for testing [29] .
Cronbach's alpha for six item PEET was 0.92, reflecting good internal consistency across scores. No participants had the lowest possible total score (6.0) on the scale, and 9 (3.1%) had the highest possible score (42.0), suggesting that there were no floor or ceiling effects.

Confirmatory Factor Analysis
Confirmatory factor analysis was also performed on the 6 items to confirm the unidimensional construct of the instrument ( Table 3 ). Inspection of the indices indicated good model fit based on the CFI and TLI, and acceptable based on RMSEA ( χ 2 (15) = 5173.4, p < 0.001, TLI = 1.00, CFI = 0.99, RMSEA = 0.08). All factor loadings were adequate, with factor loadings ranging from 0.75 (Item 4) to 0.92 (Item 1), see Table 3 .

Concurrent Construct Validity
The correlation between total score for the three validation questions and the 12 item version total score (r) was 0.70 (95% CI 0.63, 0.75) vs. 0.71 (95% CI 0.65, 0.77) for the six item version. Both were greater than 0.40, which supported the construct validation of the instruments [17] . The correlation between the validation question scores and the 12 question total score was slightly lower than that for the six question version, but the difference was not statistically significant ( r = -0.018, 95% CI -0.076 to 0.004).

Discussion
Patients and members of the public who provided input into CTFPHC guideline development reported high levels of engagement. Clustering of responses was noted at the upper end of the scale, but neither a ceiling nor floor effect for total scores was found. High levels of inter-item correlations and internal consistency for the 12 item PEET suggested item redundancy, specifically potential conceptual overlap between questions. Consequently, a shorter 6-item version of the tool was developed, with similarly good reliability. CFA found a good model fit for both versions of the tool and identified a single dimension to the data. Good measures of concurrent validation were found for both tools with no difference between versions. Considering decreased respondent burden and similar reliability and validity measures, the more economical six item tool is preferred.
Limitations have been identified with the growing number of tools available to evaluate patient and public engagement in health care policy and research development. These include the lack of validation and evaluation of measurement properties (92% of tools did not report reliability measures), lack of a theory-based framework [30] , and lack of specification of purpose and context of the engagement activity for which the tool is intended. [31] The PEET-6 is an efficient tool that addresses these gaps, has good measurement properties, and is theoretically informed and specifically intended to support patient and public involvement in the context of guideline development activities.

Fig. 1. Continued
Guideline developers face challenges in stakeholder engagement throughout guideline development. They adopted various approaches to incorporate the perspectives of patients and the public. Some have criticized such efforts as tokenistic, identifying lack of participant remuneration, failure to prepare participants adequately (e.g., materials, knowledge), and other barriers to meaningful engagement [11] . Such limitations have been identified as ultimately "fueling inequity in guidelines" [11] . To our knowledge, the PEET is the first tool designed and evaluated for reliability and validity in the context of guideline development. Similar tools in other contexts include a generic 21 item instrument developed by Abelson et al., which is much longer than the PEET-6 [31] . It is intended for broad application in healthcare organizations, provides a qualitative assessment of the engagement process, and is supported by face, content [2] , and usability testing [30] . Item generation was based on literature review and consensus among engagement experts. Stocks et al. [32] also developed a generic tool, in this case, to support healthcare researchers by providing quantitative outcomes of the quality of the engagement process. This theory-informed, 24-item tool is supported by acceptable to good internal consistency (Cronbach's α 0.74-0.81), with discriminatory ability to measure decreasing scores in engagement quality over time (within-subject test, re-test). Still, it is limited by a ceiling effect to measure improved engagement experience over time. It is also much longer than the PEET-6.
There are limitations to consider in interpreting our results. Future analyses by other guideline developers should consider a test, re-test reliability (post-engagement) analysis to explore stability of responses within participants. Inter-rater reliability could be done to understand the tool's capacity to discriminate between types of engagement activities (e.g., focus groups, interviews, surveys) and stages of engagement (e.g., outcome prioritization, recommendation formulation, dissemination activities). Such findings may identify optimal strategies for engagement during guideline development. Reliability and validity testing in other guideline development groups is encouraged to con-firm confidence with the unidimensionality of the construct and internal consistency of the items.

Conclusion
Both PEET-12 and PEET-6 provide guideline developers with a measure of the overall quality of their patient and public engagement activities, ultimately supporting the development of implementable, meaningful and equitable clinical practice guideline recommendations. To minimize response burden, guideline developers may prefer PEET-6.

Acknowledgments
We would like to thank Ms. Danica Buckland for contributing to the data collection and refinement of the 12item PEET.

Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Appendix A: Patient Engagement Evaluation Tool (12-item with 3 validation items)
Please respond to each of the following statements using the scales provided.
Please respond to each question using the following ratings: 1: Not at all (no extent) 2: Very small extent 3: Small extent 4: Fair extent 5: Moderate extent 6: Large extent 7: Very large extent.
If you select 1-4 for any question, please explain your rating in the space below the question. 1) To what extent do you believe that your ideas were heard during the engagement process? 2) To what extent did you feel comfortable contributing your ideas to the engagement process? 3) Did organizers take your contributions to the engagement process seriously? 4) To what extent do you believe that your input will influence final decisions that underlie the engagement process? 5) To what extent do you believe that your values and preferences will be included in the final health advice from this process? 6) To what extent were you able to clearly express your viewpoints? 7) How neutral in their opinions (regarding topics) were organizers during the engagement process? 8) Did all participants have equal opportunity to participate in discussions? 9) How clearly did you understand your role in the process? 10) To what extent was information made available to you either prior or during the engagement process so as to participate knowledgeably in the process? 11) To what extent were the ideas contained in the information material easy to understand? 12) How clearly did you understand what was expected of you during the engagement process? 13) How clearly did you understand what the goals of the engagement process were? 14) To what extent would you follow health advice from the Canadian Task Force on Preventive Health Care (if it related to your health condition)? 15) To what extent would you advise others to follow health advice from the Canadian Task Force on Preventive Health Care (if it related to their health condition)?