iros2019國際學(xué)術(shù)會議論文集decoding the perceived difficulty of communicated contents by olderpeople toward conversational robot-assistive elderly care

上傳人：我*** IP屬地：北京上傳時間：2020-09-01 格式：DOCX 頁數(shù)：6 大?。?.65MB 積分：9.6 舉報 版權(quán)申訴

免費預(yù)覽已結(jié)束，剩余1頁可下載查看

 下載本文檔

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認領(lǐng)

文檔簡介

1、IEEE Robotics and Automation Letters (RAL) paper presented at the2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019Decoding the Perceived Difficulty of Communicated Contents by Older People: Toward Conversational Robot-Assistive Elderly C

2、areSoheil Keshmiri1 and Hidenobu Sumioka1 and Ryuji Yamazaki2 and Hiroshi Ishiguro1,3long.However, enabling robots to interact with humans is a complex task and even more so when it comes to humans verbal communication: a conversation that resonates with one person may not sound the same to another,

3、 people lose their attention in different paces, individuals perceive difficulty of a topic in their own ways. Despite substantial advances in facial feature analysis 6, such facial expressions may not be as informative in case of verbal communication. For instance, a frowning face while listening t

4、o a conversation might signal attention or difficulty in following an statement than discomfort or anger. Such contextual effects during a verbal communication are highly subjective (i.e., vary from individual to individual) and internalized: they may not be immediately available through conventiona

5、l responses such as facial expression.Brain as the base for behavioural responses can help alleviate some of these shortcomings. In particular, brain- based approach to human-robot interaction is well-suited for verbal communication in which robotic media need to track the perceived complexity of co

6、nversational topic by their human companions in order to sustain their interaction through modulation of the communicated content. Such an ability can especially be helpful when these agents interact with individuals who struggle with expressing themselves (e.g., overstressed or shy persons and indi

7、viduals with such diseases as selective mutism).In this article, we aim at online estimation of the older peo- ples perceived difficulty of communicated contents during verbal communication based on pattern of their prefrontal cortex (PFC) activation. We focus on storytelling as a first step toward

8、decoding of the conversational communicationAbstract In this study, we propose a semi-supervised learn- ing model for decoding of the perceived difficulty of communi- cated content by older people. Our model is based on mapping of the older peoples prefrontal cortex (PFC) activity during their verba

9、l communication onto fine-grained cluster spaces of a working memory (WM) task that induces loads on humans PFC through modulation of its difficulty level. This allows for differential quantification of the observed changes in pattern of PFC activation during verbal communication with respect to the

10、 difficulty level of the WM task. We show that such a quantification establishes a reliable basis for categorization and subsequently learning of the PFC responses to more naturalistic contents such as story comprehension. Our contribution is to present evidence on effectiveness of our method for es

11、timation of the older peoples perceived difficulty of the communicated contents during an online storytelling scenario.I. INTRODUCTIONA distinct attribute of robots in comparison with other media is their physical embodiment which allows for a sense of togetherness 1. Research suggests that children

12、 who read with the learning-companion robot consider their reading companion to support their reading comprehension and that it motivates a deepening social connection 2. Along the same direction, Mann et al. 3 find that people are more responsive to robots than computer-based healthcare systems. Ad

13、ditionally, Keshmiri et al. 4 identify that tele- communicating through a humanoid results in the older peoples brain to exhibit a similar activation pattern as in- person communication.These findings unanimously identify the potential of robots for improving the accessibility, consistency, and qual

14、ity of our public and medical care services. At the same time, they also imply the necessity for increased social interaction ability of robots 5 if we are to harness their potentials and positive impacts on our social lives in its earnest. After all, social interaction is a bidirectional communicat

15、ion channel and interactive media that can comprehend their human com- panions expectations to respond accordingly is the minimum requirement if such interactions and relationships are to lastsories scripts can be kept intact and repeated todifferent individuals without any change in their contents,

16、thereby allowing for the control of such confounders as subtle differences in conveyed information. In this context, the core issue is how to evaluate the individuals perceived difficulty of a verbally communicated content, considering the lack of an objective quantification for such perceptions. He

17、re, we hypothesize that the perceived difficulty of a verbal communication is reflected in the cognitive load that a person experiences. In cognitive psychology, the cognitive load refers to the effort that is endured by the working memory (WM): the core component of the human cognition that include

18、s language comprehension 7. Previous studies have formulated such simple WM tasks as mental arithmetic (MA) 8 and n-back 9 to quantitatively evaluate the level of cognitive load. Furthermore, functional imaging has provided*This research was supported by JST CREST Grant Number JP- MJCR18A1, JSPS KAK

19、ENHI Grant Number JP19K20746, and ImPACTGrant Number 2014-PM11-07-01.1Soheil Keshmiri and Hidenobu Sumioka are with Advanced Telecom-munications Research Institute International (ATR), Kyoto, Japan, Hi- roshi Ishiguro is with Graduate School of Engineering Science, Os-aka University, Japan. soheil,s

20、umiokaatr.jp 2Ryuji Ya-mazaki is with School of Social Sciences, Waseda University, Japan. rysaoni.waseda.jp 3Hiroshi Ishiguro is the with Graduate School of Engineering Science, Osaka University, Japan, and the Visiting Director of Hiroshi Ishiguro Laboratories (HIL) at ATR.ishigurosys.es.osaka-u.a

21、c.jpCopyright 2019 IEEEa considerable evidence that shows the neural correlates of WM process reside in PFC 8, 9, 10.We propose to evaluate the perceived difficulty of commu- nicated contents during verbal communication via cognitive loads that are estimated based on brain activities during simple W

22、M tasks. Specifically, we first organize cluster spaces that are formed through application of K-mean al- gorithm 11 on the near-infrared spectroscopy (NIRS) time series of older peoples PFC activity in response to induced cognitive load by n-back (n = 1, 2) auditory task (referred to as NBT hereaft

23、er). In this task, participants are required to recall the reoccurrences of sequential (i.e., n = 1) or every- other (i.e., n = 2) occurrence of numerical values (1 through 9). We use NBT since it forms a better basis for quantifica- tion of the verbally communicated contents, considering its effect

24、 on PFC 12 and its ability in identifying the change in PFC activation in response to individuals emotions and change in mood 10. Next, we map older peoples PFC activity during an easy/hard listening task (referred to as EHL hereafter) onto NBT clusters. This mapping serves as a refinement that allo

25、ws for objective quantification of the brain activation during verbal communication based on well- defined clusters of the n-back, thereby including the PFC information during EHL that is not available in a pure WM task setting. EHL is designed to induce different level of cognitive loads on older p

26、eoples PFC by modulating its communicated information. This mapping process results in quantification of the frontal activities during EHL according to their proximity to the NBT clusters centroids (i.e., their centers): a process referred to as cross-labeling (e.g., label 1 if PFC activity is close

27、r to n = 1 cluster or 2, otherwise). Last, we use these cross-labeled PFC activities to train a linear supervised classifier for decoding of the older peoples PFC responses to online communicated topics.We show that our method can capture cognitive load of the older people during a natural storytell

28、ing scenario and that its estimation is associated with the older peoples self- assessment of the difficulty of the story. Our contribution to human-robot interaction is to form the first (to the best of our knowledge) preliminary step toward a conversational- based robot-assistive elderly care via

29、enabling these media to predict the difficulty of their verbally communicated content (e.g., while telling a story in an elderly care) as perceived by the older people.perceived difficulty of stories. In the following section, we explain each step in details.A. Choice of Feature SpacePrevious result

30、s 13, 14 indicated that differential en- tropy (DE) (i.e., average information content of a continuous random variable) significantly outperforms feature spaces that are predominantly used for classification of f/NIRS time series of human subjects PFC activity. Due to these results, we used linear e

31、stimate of DE for extracting features of the PFC activity.B. Clustering of NIRS times Series of n-back WM taskFigure 1 (A) and (B) show this process. We formed our n-back WM cluster spaces through application of K- mean algorithm 11 with two centroids on DE feature vectors of every five-second-long

32、NIRS time series of PFC activity during one- and two-back WM tasks. This resulted in formation of two clusters (i.e., C1 and C2 clusters, Figure 1 (B). We computed a DE feature vector (i.e., V in Figure 1 (A) for a given n-back NIRS time series of PFC activity of each participant as 15:1H(Xj) =log2(

33、2e2 )(1)Xj2whereis the variance of thenon-overlapping seg-2jthXjment of entire time series X of the participants PFC activity. It is worthy of note that the interpretation of C1 and C2 as representatives of PFC activation in response to easy/difficult communicated contents finds evidence in differen

34、tial PFC activation in response to one- and two-back WM tasks 9. In this study, we used data from 13 that pertained to twenty eight adults frontal activities (eleven males and seventeen females, M = 30.96, SD = 10.84) who performed one- and two-back tasks.C. NBT-Based Cross-Labeling of EHLFigure 1 (

35、C) illustrates this step. We mapped DE feature vectors of participants NIRS PFC activity during EHL onto C1 and C2 cluster spaces based on their L2-norm distances (i.e., Euclidean distance) to centroids of these clusters. We labeled these vectors as easy (i.e., 1) if they were closer to C1s center o

36、r difficult (i.e., 2) if they were closer to C2s center. This resulted in formation of clusters L (short for lower cognitive load) and H (short for higher cognitive load) that excluded NBT and were solely based on EHL. As a result, the EHLs labeling with respect to NBT established a correspondence b

37、etween PFC activity in response to verbal communication and clusters of NBT.D. Training a Linear Supervised ModelFigure 1 (D) shows this process. We used 80.0% of EHL cross-labeled data for training while utilizing the remainder 20.0% for cross-validation (CV) to train our linear supervised classifi

38、er. We used the linear supervised classifier in 13 that is based on a modified canonical linear regression. We chose this model due to its significantly improved accuracyII. METHODOLOGYFigure 1 shows an overview of our method. it consists of five steps A) feature extraction i.e., calculating the inf

39、or- mation content of the brain activity, B) clusters formation using the participants PFC activity in response to induced cognitive load by NBT, C) NBT-Based cross-labeling of older peoples PFC activities during EHL which involves their labeling based on their proximity to NBT clusters centroids (e

40、.g., label 1 if PFC activity is closer to n = 1 cluster or 2, otherwise), D) training a linear supervised model with cross-labeled EHL data, and E) online estimation of theFig. 1. Models schematic diagram. (A) DE feature vectors for PFC activity of the participants in response to one- and two-back W

41、M tasks were calculated, using equation (1). (B) These feature vectors were used to form clusters C1 and C2 through application of K-mean algorithm 11. (C) C1 and C2 clusters were utilized for labeling the DE feature vectors of EHL time series of PFC activity of human subjects via mapping these vect

42、ors onto C1 and C2 clusters based on their L2-norm (i.e., Euclidean distance) to the centroids of C1 and C2 (i.e., their respective centers), resulting in formation of EHL-based clusters, L (short for lower cognitive load) and H (short for higher cognitive load). (D) This cross-labeled data was furt

43、her used for training a linear supervised classifier 13 for online classification of PFC activity of older people in response to communicated contents. During training, 80.0% of EHL was used as training data. We used the remainder data of EHL for cross-validation (CV). (E) Trained linear supervised

44、model was used for online estimation of the perceived difficulty of communicated contents by older people during conversation. (F) Once the session was over, the model counts the number of DE feature vectors that are classified as members of L or H clusters. Subsequently, it labeled the session as d

45、ifficult/easy if number of DE feature vectors assigned to H/L during the session was larger than those in L/H, thereby returning this count along with the average of the L2-norms of the DE feature vectors of the selected cluster.in comparison with dominantly adapted classifiers for NIRS- based n-bac

46、k WM task in the literature.During the training, we adapted a brute-force search that started with a single feature (i.e., length one feature vector) through ten (i.e., feature vectors of length ten). For eachverbal communication. In the first experiment, we verified that the trained model with the

47、recorded data during EHL task had the ability to classify the NBT. In the second, we verified that the trained model was capable of estimating the perceived difficulty during storytelling (i.e., STE). Consider- ing the two-class labeling in our approach (i.e., L = 1 and H = 2), the chance level accu

48、racy was 50.0%.All participants were free of neurological and psychiatric disorders and had no history of hearing impairment. Subjects were seated in an armchair with head support in a sound- attenuated testing chamber, with instructions to fully relax while their eyes closed. All experiments were c

49、arried out with written informed consents from all subjects.We used a minimalist design humanoid called Telenoid (Figure 2 (b) in our experiments. Motion of Telenoid was generated based on voice of the operator, using an online speech-driven head motion system 16. We placed Telenoid on a stand in an

50、 approximately 1.4 meter distance from the seat of the participant (Figure 2 (a).Near Infrared Spectroscopy (NIRS) 17 was used to collect PFC activity of the participants. We chose NIRS due to its non-invasive operational setup, portability, and relative immunity to body movement 18. In our experime

51、nts, we acquired NIRS time series data of the participants using aof these lengths 2, we also checked whether inclusionof polynomial degrees to capture the interaction between the elements of a given feature vector can improve the performance. Therefore, we checked for polynomial degrees zero (i.e.,

52、 no polynomial feature) through seven. We found that feature vectors of length four combined with polynomial degree of two yielded the highest prediction accuracy. There- fore, we used the length four feature vectors with polynomial degree of two.E. Online Estimation of the Perceived DifficultyWe us

53、ed our trained linear model for online estimation of the perceived difficulty of communicated contents by older people during storytelling experiment. At every prediction cycle (i.e., every 20-second in current implementation), our model summarized the current PFC activity time series of the older p

54、eople into its calculated feature vector. Next, it utilized the trained linear model to estimate the correspondence between the feature vector of the current PFC activity to two clusters. It then returned the magnitude of the induced difficulty of the communicated topic at that prediction cycle (i.e

55、., L2-norm of its feature vector) along with its estimated label (i.e., whether closer to the Ls or Hs centroid). The older peoples perceived difficulty was estimated based on total number of DE feature vectors that were classified as members of L or H clusters.III. EXPERIMENTSWe conducted two exper

56、iments to verify the ability of our model in capturing the older peoples perceived difficulty ofFig. 2. (a) Experimenter demonstrates experimental setup. (b) Telenoid.ing a one-minute-long resting data that was then followed by its corresponding topic. We kept the communicated contents intact in all

57、 the sessions. However, we randomized the order of easy/difficult contents among participants. Every subject participated in all of these settings.For model verification, we used the original labeling of NBT data for one- and two-back WM tasks (i.e., prior to K- mean clustering) from 13. This enable

58、d us to determine whether induced cognitive loads during NBT formed a proper basis for quantification of the cognitive demands on PFC during verbal communication. We considered our models prediction a true positive (tp) if its estimation and the NBTs original label were both 2 (i.e., difficult, Section II-C). Similarly, we considered it a true negative (tn) if estimated and original labels were both 1 (i.e., easy, Section II-C). Otherwise, we considered the estimate a false positive (fp) (i.e., predicted label = 2 and original label = 1) or a false negative (f

人人文庫> 全部分類> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

iros2019國際學(xué)術(shù)會議論文集decoding the perceived difficulty of communicated contents by olderpeople toward conversational robot-assistive elderly care

文檔簡介

溫馨提示

最新文檔

評論

iros2019國際學(xué)術(shù)會議論文集decoding the perceived difficulty of communicated contents by olderpeople toward conversational robot-assistive elderly care

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔