Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Text and Data Mining (TDM) from HKUST Licensed Material: Home

This guide is developed to help HKUST users learn what publishers permit text and data mining via their regular subscriptions.

Text and Data Mining - HKUST Subscriptions

In autumn 2020, HKUST Library's Research Support Services did a small study on text and data mining (TDM) of Library subscribed resources. The findings appeared in a Research Bridge article, Text and Data Mining: Full-text Databases.

Majority of the publishers that support TDM offer the service free-of-charge. However, there are usually some rules and requirements to fulfill. The data mining and data delivery methods may also be quite different.

Commonly Seen Terms and Conditions:

  • Use for non-commercial research purposes.
  • Can only text-mine subscribed and open access content.
  • Follow the download limit, e.g. 3 requests per second.
  • Disallow sharing the data with third parties.
  • Delete the data once the project ends.
  • Use APIs to extract data instead of crawling the database by web robots, spiders, etc.

For Cambridge and Sage, their TDM terms of use are very similar to the ones listed above.

Elsevier

Elsevier allows a certain amount of TDM to subscribed content.

  • Supplies over 40 APIs for Elsevier’s products including Scopus, ScienceDirect, SciVal, PlumX, and others.
  • Users need to obtain an API key via Elsevier’s Developer Portal.
  • HKUST researchers only have access to the content HKUST subscribed to + Open Access.
  • An Object Retrieval API is available for mining images.
  • There are limitations on how much and what speed you can harvest for TDM:

    "...there are no hard limits on the number of items that may be downloaded via our API. Nevertheless, a reasonable and customary rate limit remains in place to ensure equal access to the API for all users, and we continue to ask users to use our service responsibly.

    We understand the need to be flexible and continue to monitor usage and consult with researchers. However, we do reserve the right to deactivate any API key if we believe usage is abusive or impacting the stability of our systems." -  Text and data mining FAQs

More info available here: https://www.elsevier.com/open-science/research-data/text-and-data-mining

Factiva

Text & Data Mining with Factiva requires a separate license.

The Library can provide contact person for researchers to ask for quote.

Gale - Cengage

Gale (Cengage): Data Mining FAQs

  • HKUST has signed an Addendum to the licensing contract.
  • For researchers' personal use only.
  • Delivers data in a hard drive and the library is responsible to arrange users to access the content in the drive.
  • Charges a fee that based on the cost of production and delivery.

JSTOR

JSTOR Dataset Services
Anyone can request a dataset through either of the two services below.

  • Self-service: limit to 25,000 documents; does not cover full text.
  • Large/full-text request: by special request and requires an agreement about the use of the data.

Nexis Uni

Lexis-Nexis has a section on their website where you can ask about using or purchasing their "Data as a Service" for larger datasets.

They also have a  LexisNexis Bulk Content API mining personal consultation service.

ProQuest - Data Studio

Researchers can pay extra to text and data mine ProQuest content that HKUST Library already owns or subscribes to via the ProQuest TDM Studio

SAGE

"Downloading articles from SAGE Journals for the purposes of text and data mining is expressly permitted in our standard licence agreements and our terms of use for no extra fee. You do not need to ask permission to systematically download articles provided that:

  • You only use the articles for non-commercial text and data mining.
  • You only download articles to which you have legitimate access, for example if they are open access or part of your institution's subscription. If you cannot view an article on SAGE Journals, you will not be able to download it.
  • You respect the following limits when downloading SJ content:
    • 1 request every 6 seconds – Monday to Friday between Midnight and Noon in the "America/Los_Angeles" timezone;
    • 1 request every 2 seconds - Monday to Friday between Noon and Midnight in the "America/Los_Angeles" timezone, and all day Saturday and Sunday."

 - https://journals.sagepub.com/page/policies/text-and-data-mining

Springer Nature

Text and Data Mining at Springer Nature

  • Offers various APIs to facilitate TDM, e.g. Citations API, SN SciGraph APIs, and more.
  • Provides a selection of metadata format such as JATS, Dublin Core, ONIX, or MARC records.
  • Supports argumentation mining.

Web of Science

HKUST Library's license with Clarivate (owner of Web of Science) allows creating & using custom data sets:

  • For internal, non-commercial purposes only
    1. Use the custom dataset for numerical or statistical analyses of data elements derived from the service
    2. Download the custom dataset for use in your own data analytics and proprietary tools
    3. Index the custom dataset for searching by authorized users and display results of such searches performed
    4. Create derivative databases consisting of the results of (1) to (3)

Limitations: You may not distribute, sublicense or publicize any portion of the custom dataset or derivative databases.

See Clarivate's Product/Service terms (p. 24)

Wiley

Wiley: Text and Data Mining

More Info - from Other Libraries

Text and Data Mining Resources -  by  Reese Manceaux of the Atkins Library at UNCC

Text Mining Resources - Princeton University Library

© HKUST Library, The Hong Kong University of Science and Technology. All Rights Reserved.