Subsection 11(5) of FOIA defines the term “dataset”. The definition contains a number of elements.
Electronic form
(5) In this Act “dataset” means information comprising a collection of information held in electronic form where all or most of the information in the collection—
This paragraph shows that the term only applies to information held in electronic form. If you only hold the requested information in another form, eg in hard copy, then the dataset provisions don’t apply. You are not required to turn the hard copy information into an electronic dataset.
This paragraph also defines a dataset as “a collection of information”. This implies that a dataset includes several data elements, rather than a single figure.
The collection is a dataset if “all or most” of it meets these criteria. If the collection does not meet these criteria, then it is not a dataset under FOIA.
Service or function
The first criterion is about why you hold the information:
…the collection -
(a) has been obtained or recorded for the purpose of providing a public authority with information in connection with the provision of a service by the authority or the carrying out of any other function of the authority…
This means that the information must be about the provision of your services or any of your other functions. You should interpret “in connection with” broadly. Your services and functions are ultimately about your purpose for existing. They are usually derived from your statutory powers or duties. Furthermore, the information must have been collected “for the purpose of providing the public authority with information” in connection with these services and functions. This suggests that it is more likely to be management information or information that you need to provide your services or carry out your functions.
Factual information
The next part of the definition specifies that “all or most of the information in the collection” must be “factual”. This does not simply mean numeric information. It could include, for example, not only figures for expenditure by departments, but also a list of property addresses.
Example
A local authority conducts a survey of visitors to a tourist attraction that they run, to gather information for business planning. The survey includes questions on people’s mode of transport used to visit the attraction and the first part of their home postcode. Both of these answers would count as factual information.
The survey also includes a question asking for any comments, in a free text box. These answers are less likely to count as factual information, if the local authority cannot measure or compare them in an objective way.
If the survey had asked for respondents to tick a box showing how likely they were to return (eg very likely, likely, unlikely), then these results would count as factual information. They would show that x% of respondents said they were very likely to return, etc.
However, if the majority of the information was factual, then the survey results as a whole would still constitute a dataset, provided the other criteria in section 11(5) are met.
Not the product of analysis or interpretation
The term “factual information” is then qualified by two further criteria:
(b) is factual information which—
(i) is not the product of analysis or interpretation other than calculation …
This suggests that the definition is limited to ‘raw’ or ‘source’ data that you have produced or obtained, rather than value-added data that’s been produced by analysis or interpretation.
The Explanatory Notes to the Protection of Freedoms Act 2012 say:
“Examples of the types of datasets which meet the definition, though not a comprehensive list, will include datasets comprising combinations of letters and numbers used to identify property or locations, such as postcodes and references; datasets comprising numbers and information related to numbers such as spend data; and datasets comprising text or words such as information about job roles in a public authority” (note 394).
The examples given in the Explanatory Notes are of data collected or produced by the authority, without further analysis or interpretation. Examples include property postcodes or the amount spent by a department or the job roles within the authority.
The phrase “other than calculation” means that information such as totals or percentages is still captured by the definition as it’s inherent within the data itself. An example of this would be if you’ve collected cost data at the level of sections within different departments. You could then add these values up to show the cost for each department. From there, you could show total costs for your whole authority, or even work out what percentages of the total costs is incurred by each department. All of this information would be captured under the “factual information” definition.
However, it is likely that a table in a report proposing how you intend to allocate departmental resources in future years, based on your policies and priorities, would not be captured within the definition. This is because you produced this information by analysis and interpretation, rather than simply recording and making calculations with factual data. The analysis and interpretation depend on factors that are not inherent in the data itself.
We do not consider that quality checking the information, eg ensuring that you have recorded the entries consistently or correcting errors, would constitute analysis or interpretation. These processes are part of the normal work of collecting and checking management information, rather than further analysing or interpreting it.
This does not mean that an inaccurate or incomplete dataset is outside the definition. Even if the information that’s requested is inaccurate or incomplete, you must still release it unless an exemption applies. But if you do release incomplete or inaccurate information in response to a request, then it is sensible to explain this is to the requester.
Official statistics
The second qualification to the term “factual information” is about official statistics:
(ii) is not an official statistic (within the meaning given by section 6(1) of the Statistics and Registration Service Act 2007) …
Official statistics, as defined in the Statistics and Registration Service Act 2007, are not datasets under FOIA. However, the underlying raw data used to produce official statistics could fall within the definition of a dataset, assuming the other criteria were satisfied.
Materially altered
The final part of the definition is that “all or most of the information in the collection”:
(c) remains presented in a way that (except for the purpose of forming part of the collection) has not been organised, adapted or otherwise materially altered since it was obtained or recorded.
To meet the definition, the information must remain presented in a way that has not been materially altered. This implies that the criterion is whether you have altered the presentation of the information, rather than the information itself. This means that if the information remains the same but the way it’s presented has changed, it can fall outside the definition.
However, the phrase “or otherwise materially altered” suggests that any such change, ie organisation or adaptation, would have to be significant. A minor change, such as reordering the columns in a spreadsheet, is unlikely to represent a material alteration to how it’s presented. The intention of the subsection is to define a dataset as a collection of raw data that is presented in essentially the same way that it was organised when you originally obtained or recorded it.
The phrase “except for the purpose of forming part of the collection” refers to the information in the dataset, since it is the information that forms part of the collection. In simple terms, a dataset is a collection of information where the way that the information is presented has not been materially altered since the information was first collected, apart from the work involved in putting the information into the dataset. The process of adding information to a dataset does not constitute a material alteration to its presentation.
When releasing a dataset in response to a request, you may need to redact exempt information, such as personal information that is exempt under section 40. For more information on this, see our guidance on How to disclose information safely. Redacting information in this way will not take the dataset out of the definition, even if the redactions are substantial. This is because the definition of a dataset in section 11(5) refers to how you hold the dataset, ie the original, unredacted version. It does not refer to a redacted version that you can release in response to a request.
There may be situations where, in order to answer a request, you extract data from various sources and compile a new table or spreadsheet. This may also involve calculating totals and percentages. Whilst you do still hold it for the purposes of FOIA, this new table or spreadsheet is not a dataset for the purposes of these provisions. The dataset provisions are about raw data. The new table or spreadsheet is not one that you obtained or recorded to provide you with information in connection with your services or functions. You have materially altered the presentation of the original data since you originally obtained or recorded it.