Add to collection
  • + Create new collection
  • Data1 has become so central to everything we do that it has its own branch of research. The emergent field of data science2 combines knowledge from mathematics, statistics, computer science and other disciplines with the scientific method3 to develop new ways of working with and gleaning meaning from the data all around us.

    Rights: gonin, 123RF Ltd

    Digital data

    Digital data refers to information in a digital format, typically consisting of binary code. It can be processed, stored and transmitted electronically.

    Data mining and data scraping

    Data scientists develop ways of training computers to perform techniques such as data mining4, which means combing vast amounts of data for meaningful patterns and extracting the information these patterns can tell us about the data. This is useful because it’s a way of extracting information from datasets far too large for a human to sift through.

    Data scientists create specialised techniques for extracting data too. Data scraping5 is the practice of training computers to pull down large amounts of data from another program’s end-user output. What that means is coding software to extract6 data from another program once it’s been displayed as information for human consumption. Normally, data transferred between machines or software programs uses automated structures unreadable to humans, but this isn’t always possible so data scrapers have to find ways of extracting the raw data from a feed of information. People and computers process information very differently, so this method of acquiring data can be very time consuming or resource intensive.

    Rights: The University of Waikato Te Whare Wānanga o Waikato

    Online recipe

    Online chefs use extraneous text to hide their work from data scrapers.

    Background photo by fabiobalbi, 123RF Ltd.

    These practices can have effects we notice around us without realising it. An example of a common use for data scraping is to trawl the internet for content that can be easily repurposed. Oddly enough, one of the best such types of content is food recipes. The reason every recipe page starts with several paragraphs of personal preamble from the author isn’t vanity – it’s to confuse data-scraper programs, ensuring only real humans can access the content they came for.

    So if everyone’s constantly collecting and using data, what might it tell them about you?

    Owning your data

    We hear data talked about a lot lately. The more data is being gathered and analysed, the more of a picture it’s able to give about many things going on in the world. Every time someone visits a website, posts to social media or pays for something electronically, somewhere a datum7 about them is logged in a much larger dataset. Even school performance is collected as data. Increasingly, it’s becoming important that individuals have access to information about what data about them is being gathered – and a say in who has access to it.

    Millions of copyrighted books, articles, essays, and poetry provide the “food” for AI systems.

    The Authors’ Guild, Open Letter to Generative AI Leaders

    One example is in the field of artificial intelligence (AI) research and development. Large language models (LLMs) like ChatGPT and generative AIs like Midjourney are trained using vast amounts of data – in this case, existing examples of human writing and art. The machines are able to digest this data to form their version of knowledge about how humans write and draw, which they use to generate new text or images from a prompt. But many artists and writers argue that their property – in this case, writing and artwork – has been included in the training data without their consent, meaning the AIs can be used to create material that infringes on their ownership of the work they created.

    Where I would put my flag up for data sovereignty8 is when it comes to our stories or our narratives or our tikanga9.

    Sonny Ngatai, te reo Māori advocate

    The field of data sovereignty – who owns and accesses data – even includes questions like whether LLMs like ChatGPT should be allowed access to languages such as te reo Māori. Some experts argue that te reo is too precious a taonga to be fed into the AI datasets and altered by machines.

    Personal data can be used in many good ways. When students take a test in school, the data in their answers is analysed to determine how well they scored and it can be used for feedback. But once again, the issue is who has access to that information and what they’re able to do with it.

    Data in your world

    The use and creation of data has become commonplace for most people – but it is useful to take a moment to ask some questions:

    • What are some ways you collect and use data?
    • Can you think of instances of data in your world?
    • What are some ways that you get information — and how does this differ from data?
    • When you think about ways that you get information or knowledge, where might the raw data come from?
    • In what instances in your everyday life do you provide data for others?

    Data activities to sample with ākonga

    Pendulums – collecting and using data: students can gather data, process it into information and use that information to make predictions.

    Using radiocarbon carbon dioxide data: students work with pre-existing data in order10 to think about how information is presented and what it might mean.

    Measuring the power output of elite athletes: students learn about some of the ways sportspeople use data to improve their performance.

    Related content

    The PLD webinar Digital tools for science learning introduces easy-to-use digital tools that can engage learners in real-time data collection.

    AI and generative learning have many positives and quite a few drawbacks. The Futures thinking toolkit can be customised to explore how changes in this technology may impact our lives and the lives of future generations.

    These Connected articles are useful in helping younger learners with the concept of data:

    Activity ideas

    There are numerous activities on the Science Learning Hub to facilitate learning about data. Use this link and then use the filters to narrow your search.

    1. data: The unprocessed information we analyse to gain knowledge.
    2. data science: A multi-disciplinary field that uses scientific methods and processes and algorithms to gain knowledge and insights from data.
    3. scientific method: The notion that there is a unique standard method central to scientific progress. There is no such unique standard method.
    4. data mining: Using software to search large amounts of data for patterns and information.
    5. data scraping: Automated practice that gathers computer-readable data from information displayed for human viewing.
    6. extract: (Noun) A chemical preparation containing the active ingredient in concentrated form. (Verb) To separate out or remove.
    7. data: The unprocessed information we analyse to gain knowledge.
    8. data sovereignty: The right and control over the collection, ownership and use of data. Information that has been converted and stored in digital form is subject to the laws of the country in which it is located.
    9. tikanga: Māori customs and traditions that have been handed down from the ancestors.
    10. order: A classification grouping that ranks above family and below class (kingdom > phylum > class > order > family > genus > species).
    Published 3 August 2023 Referencing Hub articles
        Go to full glossary
        Download all

        data

      1. + Create new collection
      2. The unprocessed information we analyse to gain knowledge.

        data mining

      3. + Create new collection
      4. Using software to search large amounts of data for patterns and information.

        data sovereignty

      5. + Create new collection
      6. The right and control over the collection, ownership and use of data. Information that has been converted and stored in digital form is subject to the laws of the country in which it is located.

        data science

      7. + Create new collection
      8. A multi-disciplinary field that uses scientific methods and processes and algorithms to gain knowledge and insights from data.

        data scraping

      9. + Create new collection
      10. Automated practice that gathers computer-readable data from information displayed for human viewing.

        tikanga

      11. + Create new collection
      12. Māori customs and traditions that have been handed down from the ancestors.

        scientific method

      13. + Create new collection
      14. The notion that there is a unique standard method central to scientific progress. There is no such unique standard method.

        extract

      15. + Create new collection
      16. (Noun) A chemical preparation containing the active ingredient in concentrated form.

        (Verb) To separate out or remove.

        order

      17. + Create new collection
      18. A classification grouping that ranks above family and below class (kingdom > phylum > class > order > family > genus > species).