INDEPTH Network
  • Contact Us
  • Login
  • Logo
    • Home
    • What is this?
    • How to use it?
    • History
    • Data
    • Citations
    • News
    • INDEPTHStats
    • Acknowledgements
    • Feedback


    Burkina Faso, Kenya, South Africa - Multi-centre Analysis of the Dynamics of Internal Migration and Human Capital in Selected INDEPTH Centres in Sub-Saharan Africa - Release 2016

    MADIMAH
    Reference ID INDEPTH.GH004.MIG.2014.v1
    Year 1992 - 2012
    Country Burkina Faso, Kenya, South Africa
    Producer(s) Collinson, Mark A. - Medical Research Council/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences, University of the Witwater
    Sponsor(s) Swedish International Development Agency - Sida -
    National Research Foundation, South Africa - NRF -
    Wallonia-Brussels Federation of Belgium - FNRS -
    INDEPTH Secretariat - INDEPTH -
    South African Medical Research Council - SAMRC -
    Collection(s)
    MADIMAH
    Metadata PDF Documentation in PDF Download DDI
    Created on May 27, 2016
    Last modified Jun 22, 2016
    Page views 11491
    Downloads 4098
    • Study Description
    • Data Dictionary
    • Get Microdata
    • Overview
    • Sampling
    • Data Collection
    • Data Processing
    • Access policy
    Data Processing
    Data Editing

    The HDSS collects data on residence episodes for each individual in the surevillance system.
    Data editing is done to check basic inconsistencies on dates and types of events: out-of-range values, coding errors, unusual frequencies. An important quality check involves the construction of a matrix crossing (current) events with following-events, referred to as the event consistency matrix to check the coherence of event sequences.

    List of instructions and recommendations to create an Event History Analysis (EHA) file
    Five steps are necessary to come up with the desired file:
    A. EHA file without covariates: this file has already been created in previous MADIMAH workshops along what are now INDEPTH standards (see INDEPTH Individual Level Data Specification V2.1, dated July 2012; and editorial by Osman Sankoh & Peter Byass in Int J of Epidemiology, April 2012).
    B. Create a file with covariates attached to individual identifiers and the corresponding round date.
    C. Create a file with covariates attached to household identifiers and the corresponding round date.
    D. “tmerge” EHA file and individual covariate file according to individual identifier and time (EHA-IND).
    E. “tmerge” this EHA-IND file and household covariate file according to household identifier and time (EHA-IND-HH).
    Step A: Harmonise EHA file (this step is a reminder from previous workshops):
    1. Harmonise variable names and labels according to INDEPTH standard.
    2. Check with matrix crossing “event” with “following event” that all order inconsistencies are removed.
    3. Check that the file contains an end of observation (OBE) event and its corresponding date (e.g. 31 Dec 2010) for all individuals, including those who are currently out of the HDSS system (because of DTH or OMG). The date of OBE should be the same for all individuals.
    4. Sort by individual identifier IndividualId and date of event EventDate.
    5. Save file under the name “yoursite_core.dta” (with the name of your site).
    Step B: Create file with individual covariates:
    1. From “yoursite_core.dta”, create a temporary file “ind_residency.dta” with IndividualId and EventDate. This file should have only one observation per individual.
    2. Rename EventDate to dateIndCov.
    3. Sort “ind_residency.dta” by IndividualId and save in temporary file “ind_residency.dta”.
    In case covariates are recorded WITHOUT actual date of change, when a change occurred between two subsequent dates of rounds when these covariates have been collected up to the round just after the end of observation (OBE) event, sites must impute date of change. The following steps are involved:
    4. Open “ind_covariates.dta” file and duplicate the record (command “expand”) corresponding to the observation just before a change occurred in the covariate (by comparing covariate value at _n and at _n+1), and on this new record, change dateIndCov to the suitable date (e.g. mid-term between date in dateIndCov [_n-1] and dateIndCov[_n+1], or any other date that you might find suitable for this individual: e.g. end of school year if this is a change of education).
    5. Create a variable EventChar and code EventChar==”ICV” for all changes in individual covariates.
    6. Recode variable EventChar into “OBS” for all other observations.
    7. Merge file “ind_covariates.dta” with file “ind_residency.dta” as many-to-one using IndividualId as a key:
    merge m:1 IndividualId using ind_residency.dta
    8. Delete records with no corresponding individual identifier in file “ind_residency.dta”:
    drop if _merge==1
    9. Sort by IndividualId and dateIndCov.
    10. Duplicate record just after censoring date (e.g. 1 Jan 2011), and on that record change dateIndCov to end of observation event date (censoring event, e.g. 1 Jan 2011), and recode EventChar==“OBE” for this new observation.
    11. Delete all records for which dateIndCov is greater than end of observation date (censoring event, e.g. 1 Jan 2011: this date should not coincide with any observation date).
    The result is a variable EventChar with ICV for all changes in individual covariates, OBS for observation time and OBE for last observation. This file “ind_covariates.dta” contains individual covariates as separate variables (e.g. education4, union5, religion, mobile2, job2…), individual identifier (IndividualId), as well as LocationId (note: there can be several LocationId per individual), and after checking that all covariates are recorded WITH date of change (whether real or imputed): dates of changes or events (dateIndCov) up to the end of observation (OBE) event (e.g. 1 Jan 2011)
    12. Best is to delete records with EventChar==”OBS” (unless you want to keep some variables related to data collection, e.g. fieldworkers' identifiers: beware of the file size!)
    13. Save file under the name “ind_covariates.dta”.
    In case covariates are recorded WITHOUT actual date of change, when a change occurred between two subsequent dates of rounds when these covariates have been collected up to the round just after the end of observation (OBE) event, sites must impute date of change. The following steps 6 to 8 are involved:
    6. Duplicate the record (command “expand”) corresponding to the observation just before a change occurred in the covariate (by comparing covariate value at _n and at _n+1), and on this new record, change dateHHCov to the suitable date (e.g. mid-term between date in dateHHCov [_n-1] and dateHHCov[_n+1], or any other date that you might find suitable for this household).
    7. Create a variable EventChar and code EventChar==”HCV” for all changes in household covariates.
    8. Recode variable EventChar into “OBS” for all other observations.
    For all cases (with or without actual date of change):
    9. Delete all records for which EventChar is greater than end of observation date (censoring event, e.g. 1 Jan 2011: this date should not coincide with any observation date).
    10. Duplicate last record, and change EventChar to end of observation event date (censoring event, e.g. 1 Jan 2011), and recode EventChar==“OBE” for this new observation.
    11. The result is a variable EventChar with HCV for all changes in household covariates, OBS for observation time and OBE for last observation. This file “hh_covariates.dta” contains household covariates as separate variables (e.g. toilet3, electricity2, houseownership2…), household identifier (LocationId), and after checking that all covariates are recorded WITH date of change (whether real or imputed): dates of changes or events (dateHHCov) up to the end of observation (OBE) event (e.g. 1 Jan 2011)
    Other Processing

    - An important quality check for the coherence of event sequences is the construction of a matrix crossing (current) events with following-events (the event consistency matrix).

    - Checks were also performed on the logical sequencing of education data and erros in sequences as well as imputation of missing values were performed. Missing values were only imputed at a particular tiem point provided there were some education measures collected for that individual. Individuals who had all education measures missing were retained in the sample with education category coded "missing" (value 9: "DK").


      Logo
    Logo

    ©2013 INDEPTH Network, All Rights Reserved