2012年9月28日 星期五

有獎徵答-- an IRT (Rasch) question (含審查意見與回覆)

ADL CAT 稿件的審查意見:
依據 difficulty parameter, why is playing chess/cards more difficult than gainful work? It seems to me that the result of this analysis suffered from a common problem with Rasch analysis, namely that frequency of performance is confused with difficulty of performance.  Please see Whiteneck G, Dijkers MP. Difficult to measure constructs: conceptual and methodological issues concerning participation and environmental factors. Arch Phys Med Rehabil. 2009 Nov;90(11 Suppl):S22-35 on the limitations of Rasch analysis in the context of participation (or instrumental ADL).

如下圖:依據 Rasch analysis 分析結果, Reading 的難度 > Work > Preparing meal > Heavy housework 這似乎違背邏輯/經驗!!



請以中文解釋之,含其臨床意義。
9/26日之前提出。
可單獨或團隊提出。

image source: https://encrypted-tbn3.google.com/images?q=tbn:ANd9GcR-MLm6Mt_SX9T30vZrnUDcpR7dRotc6vbVBMn-qVW3g9pflovl


Comment: The results of table 2 are intriguing. Why is playing chess/cards more difficult than gainful work? The same for art activities? It seems to me that the result of this analysis suffered from a common problem with Rasch analysis, namely that frequency of performance is confused with difficulty of performance.  Please see Whiteneck G, Dijkers MP. Difficult to measure constructs: conceptual and methodological issues concerning participation and environmental factors. Arch Phys Med Rehabil. 2009 Nov;90(11 Suppl):S22-35 on the limitations of Rasch analysis in the context of participation (or instrumental ADL).

* Response: It was our mistake to state that “The items are largely listed in increasing level of difficulty.” We have deleted the statement. The items’ level of difficulty cannot be simply determined by their order (or parameter of step difficulty) for two reasons. The first is that there are various steps of difficulty (various numbers of item response categories). The second is that the discrimination parameter also affects the hierarchy of item difficulty. That is, some items might be more difficult at certain level of function, but they might be easier at another level of function. This is a potential drawback of the 2-parameter model. We have addressed this issue as follows:
 “We used the GPCM (a type of 2-paramter IRT model, including discrimination and difficulty parameters)34 to fit the patients’ responses and estimate the item parameters for the ADL item bank. The GPCM is a flexible model from the parametric, unidimensional, polytomous-response IRT models. Because GPCM allows discrimination to vary from item to item, it generally fits response data better than a 1-parameter model (e.g., Rasch model).35 However, using a 2-parameter IRT model may have made the results difficult to interpret. For example, an item’s level of difficulty cannot be simply determined by its “difficulty” parameter (i.e., step threshold) because the discrimination parameter (i.e., slope) also affects the hierarchy of item difficulty. That is, some items might be more difficult at certain levels of function, but they might be easier at other levels of function. This is a potential drawback of the 2-parameter model. Nevertheless, the 2-parameter IRT model is suggested due to its flexibility and is adopted by several health-related CATs.12, 15, 36” 

Furthermore, “Even if the discrimination parameter is fixed, explanations of an item’s value of “difficulty” parameter have to be cautious. For example, our results showed that the value (3.54) of difficulty parameter of “volunteer work” was unexpectedly higher than that (2.53) of “gainful work” (Table 2). The value of difficulty parameter is commonly interpreted as the “level of difficulty or challenge” of a task. The difficulty parameter of an item is influenced by how often patients perform the task.22 Appendix 1 shows that only a small proportion of the patients had been working (gainful work (7.1%) and volunteer work (3.1%)). However, that frequency might not necessarily indicate the level of difficulty of a task. Particularly, the patients tested were all outpatients, who were unlikely to seek and do volunteer work. On the other hand, gainful employment is often an obligation or responsibility. Thus, the value of difficulty parameter is not equal to the level of challenge of an ADL task.37 In addition, as aforementioned, we used a 2-parameter model, not the 1-parameter IRT model. Due to these issues, our data are not sufficient to determine whether an item with a low value of difficulty (e.g., doing gainful work) is less challenging than an item with a high value of difficulty (e.g., doing volunteer work).” 

Your further comments are appreciated!

16 則留言:

  1. 想請問老師FAI的CAT題目及施測方式,與紙本的FAI是否相同?
    FAI中文版的紙本,問的是個案從事IADL活動的頻率,而非執行能力或自覺難度,因此想澄清老師文中所說的「難度」,指的是否是個案執行該項目的頻率?("Reading 的難度 > Work > Preparing meal > Heavy housework")

    回覆刪除
  2. 這個Reviewer提出的質疑還蠻正中下懷...加上Whiteneck等人的文章質疑IRT的概念無法解釋像Participation之multidimensional的架構, 特別是每個題目對個案都具有"個人特殊"的意義, 比方說看書的頻率本來就每個人有所不同, 放在IRT由難到簡單的單一架構上, 就個人間迥然的差別容易被稀釋或放大...且更何況是台灣人普遍都很少以看書當興趣....所以以困難度來看, 才會被IRT模式歸類在上頭...

    相反來說工作每天都會做, 頻率上相對來說就比看書來高, 故IRT所得的困難度就不會太難.

    我想到的可能解決的方法:

    1. 強調IRT是用來分析建構效度, item的困難度排列是次要

    2. 將所有的困難度措辭通通改成"頻率"來解釋, 在上頭的是頻率發生比較少的項目, 而下頭是幾乎天天都會作或都可作的項目, 以每個項目的意義來說, 對於CAT都是具有用處, 像是如果個案都能日常中(或能在百忙中抽空)來看書的話, 那樣IDAL的參與程度應該在工作或其他家事上都已經駕輕就熟~~~更何況老師您的個案是中風患者(雖然看書是個很簡單的活動,但很少患者生病後還有閒情逸致看書....更何況在台灣的文化下)

    以上是我的初感想...祝回稿順利嚕~~~

    回覆刪除
  3. 親愛的謝老師:
    芙蘭切量表不就是測participation的"頻率"嗎?
    所以在Rasch的item diffuculty不就是在此context之下表示的正是frequency嗎?
    PS:以上是我想當然爾的淺見,懇請老師給我指點迷津!

    回覆刪除
  4. Yes, FAI 題目是施測「參與/從事頻率」。
    Rasch 理論的主軸是建立於「頻率高低」,各項目如果皆測量同一向度,則各項目可依頻率排序之。若以教育/考試而言(Rasch, IRT 之原始應用領域),這代表難度(越少人「通過」的題目代表「難度越高」)。所以 Rasch 的主要考量參數為「難度 (difficulty)」。
    其它 IRT 理論尚考量 discrimination & guess 等參數。

    謝謝各位的參與!!
    有些「意見」已接近我的看法(也應是理想回覆),我就先 hold 住,未呈現之。

    請各位繼續提出您的「答案」!!

    回覆刪除
    回覆
    1. 謝謝老師的回覆!我想這也就是為什麼在ICF的participation和activities構面下,還再分capacity和performance兩種的部分原因吧!我想似乎是類似謝老師以下這篇大作的看法吧!?
      The diverse constructs use of activities of daily living measures in stroke randomized controlled trials in the years 2005-2009.Hsieh CL, Hoffmann T, Gustafsson L, Lee YC.J Rehabil Med. 2012 Aug 23;44(8):720-6.
      所以,在問卷設計題目和回答選項的敘述時要注意是否為paricipation下的同一概念?也要先做CFA或Rasch的單向度檢測對嗎?

      刪除
  5. 在老師所貼上的圖上有一題BI2-Bathing, 我記得BI的測量標準(獨立,需要協助...等), 好像跟FAI不同(頻率), 這可能是Reviewer另外感到疑惑的點(困難vs頻率)吧? (個人猜測) 回覆時需留意避開或特別解釋嚕~~

    回覆刪除
  6. 這圖僅是「例子」,實際的 ADL CAT items (含 BADL IADL)的 response category 皆為「獨立程度」

    回覆刪除
    回覆
    1. 我們(恩琦、姿誼、雅珍、菀薈)認為:
      在ADL CAT的分數設定當中,IADL 的0分可能有2個意思:無法獨立或沒有從事。然而,個案沒有做並不代表他沒有能力,只是他沒有做這件事情的習慣。

      臨床意義:如果治療師以2個個案分數相比,則無法判斷分數較低的個案是IADL沒有從事或是個案執行IADL的獨立程度不佳,也不知道個案是哪一題沒有做/須協助。

      建議:ADL CAT題庫之計分登錄應區分個案是沒有從事或無法獨立。例如:沒有從事登錄為missing(並且在CAT結果顯示為X),無法獨立登錄為0分。因為Rasch仍然可以就題目的得分估計個案程度。

      刪除
    2. 抱歉,補充一下建議的部分:
      建議評估的結果可以列出個案作答的題目是哪些,而不是只有編號,
      以便治療師針對個案的問題作訓練。

      刪除
    3. To all: 這跟我的問題關聯薄弱,不是嗎?
      沒有從事者,我們也無法知道他/她的「執行該ADL情況」,不是嗎?且IRT計分,原本就沒問題。未從事者得 0 分。

      To 姿誼:CAT 的施測項目有限,您的建議之功效,可能有限。CAT之目的為快速評量,若擬當成治療 planning,CAT的價值有限。
      另外,您的補充建議,跟原問題,距離遙遠。

      刪除
    4. 我們原本以為題目難度的排序是因為未從事者得 0 分造成的問題,所以才會有這樣的回答。但是從您的審查意見回覆結果看來,那似乎並不是主要原因。
      又,我也不是很清楚為什麼鑑別力參數會影響題目難度的排序耶,
      請問是因為鑑別力(斜率)代表了題目能夠鑑別的個案程度範圍,所以會影響題目難度排序嗎?

      關於補充建議,只是想說在臨床意義的地方我們提到了治療師不知道個案作答了哪些題目,
      所以提出這個補充作為呼應。
      不過,基於CAT的目的,個案是因為測了哪些題目而得到這個分數,
      可能不是那麼重要吧!

      刪除
    5. * 您的意見中,有許多「似乎」「可能」,立場與結論皆不確定,令人困惑。
      * 您可嘗試列出究竟哪些概念不清楚,再跟同學切磋,若仍有疑惑,再找我澄清。
      * discrimination 參數為何會影響項目難度之排序,我是如何回覆的?您哪裡看不懂?
      * 有關「遙遠」的議題,宜另外提出,以免「攪局」。

      刪除
  7. 老師,不好意思~我對"Particularly, the patients tested were all outpatients, who were unlikely to seek and do volunteer work."
    這個論點有點疑問。因為門診病人在時間的安排上,應該還算有彈性(一個星期幾天做復健),如果恢復情況不錯,或許還是可以去當志工。。。我在想,是不是志工這樣的事情本來就是自己有沒有興趣去做,一般人從事的比例可能本來就不高,那就慢性中風病患而言,也許即使他們已經恢復到很不錯的程度,還是會覺得自己有身體上的限制,所以從事的比例可能會再更低些。這是我的想法,請老師指教,謝謝!

    回覆刪除
  8. 我沒列上參考文獻,所以 unlikely 只是「可能」「主觀」的判斷
    毋須在意
    如果「刻意」要找文獻支持,我想也不難的。

    回覆刪除
  9. 老師,請問什麼是discrimination parameter?
    此參數名稱看似區辨效度,但為何此參數會影響題目難度之排序呢?

    回覆刪除