謝清麟教學/研究 blog: 有獎徵答-- an IRT (Rasch) question （含審查意見與回覆）

2012年9月28日星期五

有獎徵答-- an IRT (Rasch) question （含審查意見與回覆）

ADL CAT 稿件的審查意見：
依據 difficulty parameter, why is playing chess/cards more difficult than gainful work? It seems to me that the result of this analysis suffered from a common problem with Rasch analysis, namely that frequency of performance is confused with difficulty of performance. Please see Whiteneck G, Dijkers MP. Difficult to measure constructs: conceptual and methodological issues concerning participation and environmental factors. Arch Phys Med Rehabil. 2009 Nov;90(11 Suppl):S22-35 on the limitations of Rasch analysis in the context of participation (or instrumental ADL).

如下圖：依據 Rasch analysis 分析結果， Reading 的難度 > Work > Preparing meal > Heavy housework 這似乎違背邏輯/經驗！！

請以中文解釋之，含其臨床意義。
9/26日之前提出。
可單獨或團隊提出。

image source: https://encrypted-tbn3.google.com/images?q=tbn:ANd9GcR-MLm6Mt_SX9T30vZrnUDcpR7dRotc6vbVBMn-qVW3g9pflovl

Comment: The results of table 2 are intriguing. Why is playing chess/cards more difficult than gainful work? The same for art activities? It seems to me that the result of this analysis suffered from a common problem with Rasch analysis, namely that frequency of performance is confused with difficulty of performance. Please see Whiteneck G, Dijkers MP. Difficult to measure constructs: conceptual and methodological issues concerning participation and environmental factors. Arch Phys Med Rehabil. 2009 Nov;90(11 Suppl):S22-35 on the limitations of Rasch analysis in the context of participation (or instrumental ADL).

* Response: It was our mistake to state that “The items are largely listed in increasing level of difficulty.” We have deleted the statement. The items’ level of difficulty cannot be simply determined by their order (or parameter of step difficulty) for two reasons. The first is that there are various steps of difficulty (various numbers of item response categories). The second is that the discrimination parameter also affects the hierarchy of item difficulty. That is, some items might be more difficult at certain level of function, but they might be easier at another level of function. This is a potential drawback of the 2-parameter model. We have addressed this issue as follows:

“We used the GPCM (a type of 2-paramter IRT model, including discrimination and difficulty parameters)³⁴ to fit the patients’ responses and estimate the item parameters for the ADL item bank. The GPCM is a flexible model from the parametric, unidimensional, polytomous-response IRT models. Because GPCM allows discrimination to vary from item to item, it generally fits response data better than a 1-parameter model (e.g., Rasch model).³⁵ However, using a 2-parameter IRT model may have made the results difficult to interpret. For example, an item’s level of difficulty cannot be simply determined by its “difficulty” parameter (i.e., step threshold) because the discrimination parameter (i.e., slope) also affects the hierarchy of item difficulty. That is, some items might be more difficult at certain levels of function, but they might be easier at other levels of function. This is a potential drawback of the 2-parameter model. Nevertheless, the 2-parameter IRT model is suggested due to its flexibility and is adopted by several health-related CATs.¹²^,¹⁵^,³⁶”

Furthermore, “Even if the discrimination parameter is fixed, explanations of an item’s value of “difficulty” parameter have to be cautious. For example, our results showed that the value (3.54) of difficulty parameter of “volunteer work” was unexpectedly higher than that (2.53) of “gainful work” (Table 2). The value of difficulty parameter is commonly interpreted as the “level of difficulty or challenge” of a task. The difficulty parameter of an item is influenced by how often patients perform the task.²² Appendix 1 shows that only a small proportion of the patients had been working (gainful work (7.1%) and volunteer work (3.1%)). However, that frequency might not necessarily indicate the level of difficulty of a task. Particularly, the patients tested were all outpatients, who were unlikely to seek and do volunteer work. On the other hand, gainful employment is often an obligation or responsibility. Thus, the value of difficulty parameter is not equal to the level of challenge of an ADL task.³⁷ In addition, as aforementioned, we used a 2-parameter model, not the 1-parameter IRT model. Due to these issues, our data are not sufficient to determine whether an item with a low value of difficulty (e.g., doing gainful work) is less challenging than an item with a high value of difficulty (e.g., doing volunteer work).”

Your further comments are appreciated!

16 則留言:

林恭宏2012年9月20日下午2:32
想請問老師FAI的CAT題目及施測方式，與紙本的FAI是否相同？
FAI中文版的紙本，問的是個案從事IADL活動的頻率，而非執行能力或自覺難度，因此想澄清老師文中所說的「難度」，指的是否是個案執行該項目的頻率？("Reading 的難度 > Work > Preparing meal > Heavy housework")

回覆刪除
回覆
匿名2012年9月20日晚上8:10
這個Reviewer提出的質疑還蠻正中下懷...加上Whiteneck等人的文章質疑IRT的概念無法解釋像Participation之multidimensional的架構, 特別是每個題目對個案都具有"個人特殊"的意義, 比方說看書的頻率本來就每個人有所不同, 放在IRT由難到簡單的單一架構上, 就個人間迥然的差別容易被稀釋或放大...且更何況是台灣人普遍都很少以看書當興趣....所以以困難度來看, 才會被IRT模式歸類在上頭...

相反來說工作每天都會做, 頻率上相對來說就比看書來高, 故IRT所得的困難度就不會太難.

我想到的可能解決的方法:

1. 強調IRT是用來分析建構效度, item的困難度排列是次要

2. 將所有的困難度措辭通通改成"頻率"來解釋, 在上頭的是頻率發生比較少的項目, 而下頭是幾乎天天都會作或都可作的項目, 以每個項目的意義來說, 對於CAT都是具有用處, 像是如果個案都能日常中(或能在百忙中抽空)來看書的話, 那樣IDAL的參與程度應該在工作或其他家事上都已經駕輕就熟~~~更何況老師您的個案是中風患者(雖然看書是個很簡單的活動,但很少患者生病後還有閒情逸致看書....更何況在台灣的文化下)

以上是我的初感想...祝回稿順利嚕~~~
回覆刪除
回覆
Wen-Hsuan Hou2012年9月21日凌晨12:05
親愛的謝老師:
芙蘭切量表不就是測participation的"頻率"嗎?
所以在Rasch的item diffuculty不就是在此context之下表示的正是frequency嗎?
PS:以上是我想當然爾的淺見，懇請老師給我指點迷津!
回覆刪除
回覆
Ching-Lin 清麟2012年9月21日上午8:03
Yes, FAI 題目是施測「參與/從事頻率」。
Rasch 理論的主軸是建立於「頻率高低」，各項目如果皆測量同一向度，則各項目可依頻率排序之。若以教育/考試而言(Rasch, IRT 之原始應用領域)，這代表難度（越少人「通過」的題目代表「難度越高」）。所以 Rasch 的主要考量參數為「難度 (difficulty)」。
其它 IRT 理論尚考量 discrimination & guess 等參數。

謝謝各位的參與！！
有些「意見」已接近我的看法（也應是理想回覆），我就先 hold 住，未呈現之。

請各位繼續提出您的「答案」！！
回覆刪除
回覆
匿名2012年9月25日下午3:16
在老師所貼上的圖上有一題BI2-Bathing, 我記得BI的測量標準(獨立,需要協助...等), 好像跟FAI不同(頻率), 這可能是Reviewer另外感到疑惑的點(困難vs頻率)吧? (個人猜測) 回覆時需留意避開或特別解釋嚕~~
回覆刪除
回覆
Ching-Lin 清麟2012年9月26日下午3:41
這圖僅是「例子」，實際的 ADL CAT items （含 BADL IADL）的 response category 皆為「獨立程度」
回覆刪除
回覆
尤菀薈(Patrice Yu)2012年9月28日晚上9:40
老師,不好意思~我對"Particularly, the patients tested were all outpatients, who were unlikely to seek and do volunteer work."
這個論點有點疑問。因為門診病人在時間的安排上，應該還算有彈性(一個星期幾天做復健)，如果恢復情況不錯，或許還是可以去當志工。。。我在想，是不是志工這樣的事情本來就是自己有沒有興趣去做，一般人從事的比例可能本來就不高，那就慢性中風病患而言，也許即使他們已經恢復到很不錯的程度，還是會覺得自己有身體上的限制，所以從事的比例可能會再更低些。這是我的想法，請老師指教，謝謝!
回覆刪除
回覆
Ching-Lin 清麟2012年9月29日清晨5:45
我沒列上參考文獻，所以 unlikely 只是「可能」「主觀」的判斷
毋須在意
如果「刻意」要找文獻支持，我想也不難的。
回覆刪除
回覆
黃怡靜2012年10月2日下午3:39
老師，請問什麼是discrimination parameter？
此參數名稱看似區辨效度，但為何此參數會影響題目難度之排序呢？
回覆刪除
回覆
Ching-Lin 清麟2012年10月3日下午6:32
下週直接問我吧！
回覆刪除
回覆

新增留言

訂閱：張貼留言 (Atom)

2012年9月28日 星期五

有獎徵答-- an IRT (Rasch) question （含審查意見與回覆）

16 則留言:

2012年9月28日星期五