1 This definition covers the process of combining the Fuzzy Match Files. After some final processing and de-duplification of cases the output from this methodology is the new measures WBL Success Rates Master File.
Purpose
2 Learner outcome data used by the LSC as part of Performance Review and the data used by ALI during inspections focuses on success rates extracted from the individualised student record (ISR) and individualised learner record (ILR).
3 This methodology shows how to use the outputs of the Fuzzy Data Matching methodology to create the WBL Success Rates Master File. The output of this methodology is the basis for all WBL Success Rates calculations based around the new measures.
Relevant Collections
4 The method is run from data collected in the most recent freeze for each year, allowing for success rates to be calculated up until the most recent month.
Source Data
5 The method uses various other files to aid in producing the new measures master file.
Derived Variables and Output Datasets
6 The methodology produces the following derived variable(s)
7 The method produces the following datasets
Detailed definition
8 This methodology follows on from preceding methodologies, which are carried out in the following order:
9 The methodology is followed by these methodologies:
10 As an output of the fuzzy matching process we have a set of files that can be combined and will produce a consistently matched dataset across all the years involved. This is because L03s have been made consistent across years.
11 The first step is to combine all the files using the provider number (L01), learner number (L03), programme type (A15) and the intermediate matching variable (matchv). Matchv either denotes the area of learning, in the case of NVQ only programmes or the sector framework code, in the case of apprenticeships.
12 The derived variables relating to the years of learning are then calculated. This is done by iteratively checking each year from 1993 through to 2025 to determine whether the planned end, actual end and start dates occur within that year. The extreme date range is used in order to catch all the data, as there are some very unlikely dates recorded in the ILR.
13 Learners are then checked to see which cases went missing between years. Each case has a set of variables called sfl_0102 through to sfl_0506; these flag which years the case has drawn data from. Therefore, if a case is in one year but not the next and has no recorded end date then that case is flagged as having gone missing in the derived variable ‘miss’.
14 Each case is then assigned an age band based upon the age of the learner at the start of the aim. All cases are assigned into 2 age bands: either “16-18” or “19+”.
15 Next, any remaining cases with the same matching variables used at the start of this process are removed such that only the one with the earlier start date remains, provided that this does not mean removing any achievements from the file. This ensures that the same information is not counted twice.
16 Cases that have the same UPIN, L03 and are leavers who have not achieved are then examined. Any cases that meet these criteria and have an earlier end date than the other cases that met the criteria will be marked as transfers and will therefore not feature in success rate calculations. Transfers are flagged in the derived variable ‘trans’.
17 The derived variable ‘starters’ is calculated as being the opposite of trans, so that the case counts as a start if and only if it is not a transfer.
18 The WBL Success Rates Live UPIN Files are then matched into the remaining data. These files flag which providers returned data in a particular year and allow for success rates to be produced only in years for which the provider was returning data to the LSC.
19 Any cases that are flagged with variable ‘miss’ are then assigned an actual end year of the last year they were seen in.
20 Any user who is not flagged as a transfer in variable ‘trans’ but who has gone missing is marked as a leaver so that they count as a non-achievement in the success rates.
21 Any case without an actual end date that has not gone missing between years is flagged as ‘continuing’.
22 Variables involved in the matching process that are not needed in success rate calculations are removed from the file. The remaining data is then saved as the WBL Success Rates Master File.
Sample Code
23 The following sample code is available