Project Overview +

Many users find it difficult to find comprehensive health information because the information, even for narrowly well-defined topics, is highly scattered across websites with no page or site containing all the relevant information.

This study enables us to: (1) deepen our understanding of why users find it difficult to find comprehensive information about healthcare topics, and (2) provide explicit guidelines for how pages in healthcare websites should be linked to enable users to easily navigate through the site, with the ultimate goal of facilitating the process of finding comprehensive information.

Aims +

Aim 1. Collect data on general, specific, and parse pages and their links through the use of an automatic web crawler, database, link-traverser, and visualizer.

Aim 2. Analyze the data to understand the link structure between general, specific, and sparse pages within each of the top-10 sites with melanoma information.

Aim 3. Report the results and implications for the design of healthcare websites.

Participants +

None. This study involves the analysis of website content.

Intervention +

In this study, the distribution of facts related to melanoma is analyzed across 10 high-quality sites. Skin cancer is the most common type of cancer, and there exists a large amount of information on the Web about this disease. The following five melanoma topics are selected for detailed analysis:

  • Self-examination in the diagnosis of melanoma (self-examination)
  • Doctor's examination in the diagnosis of melanoma (doctor's examination)
  • Diagnostic tests used in the diagnosis of melanoma (diagnostic tests)
  • Disease stages used in the diagnosis of melanoma (disease stage)
  • Descriptive information related to melanoma risk and prevention (risk/prevention)

A two-step method is used for identifying a list of facts that are required for a comprehensive understanding of the five melanoma topics.

  1. A list of facts for each topic is identified by analyzing all 38 links across high-quality sites on the melanoma page in MEDLINEplus. The identification of facts about the five melanoma topics results in 14 facts for self-examination, 6 facts for doctor's examination, 6 facts for diagnostic tests, 13 facts for disease stage, and 15 facts for risk/prevention.
  2. Two experienced skin cancer physicians are asked to independently rate the importance of facts related to each of the five topics using a 5-point Likert scale of fact importance (1 Not important to know, 2 Slightly important to know, 3 Important to know, 4 Very important to know, 5 Extremely important to know). The physicians are told that they should rate the importance of each fact keeping in mind a concerned user looking for the melanoma topic on the Web.

The goal of our analysis is to reveal if general, specific, and sparse pages for each of the top 10 websites with melanoma information are systematically linked. The directed graph from the visualizer is analyzed to understand how the pages within each site are linked together. Our analysis examines the shortest path between the relevant pages, and report on how many links a user must click in order to traverse at least one, or all general, specific, sparse paths to find comprehensive information about a topic.

Findings +

The distributions of facts across pages for all five topics were skewed towards pages having few facts.

No page in any site had all the facts for any topic.

No combination of pages within a site contained all the facts for four of the five topics.

The distribution of facts that were rated as very important and extremely important was only marginally different from the above results, and full coverage of those facts was inconsistent across the sites.

Further analysis suggests the existence of general pages (that cover many facts in a medium amount of detail), specialized pages (that cover few facts in a high level of detail), and sparse pages that contain (few facts in very little detail). The skewed distributions therefore appear to occur because there were many more specialized and sparse pages compared to general pages.

Conclusion +

Few pages had many facts, many pages had few facts, and no single page or site provided all the facts. While such a distribution conforms to other information-related phenomena, the distributions were caused by a trade-off between depth and breadth, leading to the existence of general, specialized, and sparse pages.

The results make explicit the knowledge needed by searchers to find comprehensive healthcare information, and suggest the motivation to explore distribution-conscious approaches for the development of future search systems, search interfaces, Web page designs, and training.

The results also provide more justification for the behavior of search experts like healthcare librarians. Such search experts visit a combination of select sources in a specific order when searching for comprehensive information because they have acquired an inherent understanding of the complexities in the distribution of healthcare information across sources.