Communities, Households and SARS/COV-2 Epidemiology (CHASING) COVID Data Portal

This is the repository for publicly available CHASING COVID data. The dataset currently contains data of participants across the United States who are participating in a longitudinal cohort study of COVID-19.

Longitudinal Databases

The data is structured as a wide format longitudinal dataset. For each time point, variables were named with the visit number “v0”, “v1” etc.

Instructions

i. Cohort eligibility and enrollment/screening description:

Cohort eligibility was determined during cohort screening and enrollment visits. To be eligible for inclusion in the cohort, individuals had to: 1) reside in the U.S. or a U.S. territory; 2) be age 18 years or older; 3) provide valid email address; and 4) demonstrate early engagement in longitudinal study activities, including: a) completion of V1 (which provided the opportunity to consent for serologic testing); and b) completion of at least one additional screening visit in addition to V1 (i.e., V0 or V2) or provision of a baseline specimen for serologic testing (S1).

Cohort screening and enrollment. Cohort screening and enrollment began on March 28, 2020 and ended on August 21, 2020. We used internet-based strategies that are effective for recruiting and following large and geographically diverse online cohorts, including at-home specimen collection. Persons aged 18 years and above who resided in the U.S. or U.S. territories were eligible to join the study. Study participants were recruited via ads on social media platforms (e.g., Facebook, Instagram, and Scruff), Qualtrics Panel, or via referral to the study (anyone with knowledge of the study was allowed to invite others to participate). Facebook and Instagram advertisements were developed in English and Spanish and were geographically targeted to people currently residing in the U.S. and U.S. territories who were 18 or older. By relying on personal networks of participants through referrals, we aimed to bolster recruitment of persons >59 years of age, who were important to have represented in the cohort because of their risk, but may not be as active on social media as younger persons.

Study staff systematically monitored cohort demographics and proactively adjusted advertisement strategies as needed to balance geographic and sociodemographic characteristics of respondents. For example, strategies could shift to prioritize recruitment of older persons if that demographic was poorly represented.

Interested participants were directed to a pre-enrollment survey (hosted by Qualtrics) to be completed in their web browser on a computer or on a mobile device. A consent form described the study, plans for follow-up assessments, and future study opportunities, including the possibility to receive SARS-CoV-2 serologic testing as part of the study. The consent form also described the incentive schedule: a drawing for $100 for a pre-enrollment survey (V0) (with 20 winners) and gift cards ranging from $5-30 for all participants for completion of subsequent surveys and antibody testing.

ii. Analysis Cohort filter: identify persons enrolled into longitudinal follow-up

All participants who ever completed a survey (or screened) were kept in the longitudinal database. However, we typically define the analysis cohort using the following filters.

  • Variables: _cohort; _cohort_v10; _cohort_v11

  • Interpretation:

    • _cohort = 1: completed V1 and at least one engagement of follow-up study (including consent to antibody specimen collection) by v6

    • _cohort_v10 = 1: completed V0/V1 and at least one engagement of follow-up study (including consent to antibody specimen collection) by v10

    • _cohort_v11 = 1: completed V0/V1 and at least one engagement of follow-up study (including consent to antibody specimen collection) by v11

iii. Differentiate missingness caused by no participation/not completing a survey, not eligible to answer the question as specified in skipping conditions (participants couldn’t see the questions), participants saw the questions but chose not to answer.

Missingness reason Data coding How to identify this missing pattern Example
No participation No coding; it will be blank Restrict analysis to survey completers using survey specific completion filter (complete_v = 1) There are 11505 missing data for labdiag_v9 variable; 8745 out of 11505 completev9 was not equal to 1, which means 8745 missingness was due to no participation in the v9 follow-up study.
Not eligible to answer the question as specified in the skipping condition No coding; it will be blank Check Qualtrics codebook for skipping condition; Display this question, if “Did you receive a viral test or an antibody test? = Viral test (PCR or rapid test)” There are 11505 missing data for labdiag_v9 variable; 2678 out of 11505 complete_v9 was equal to 1 and testtype_v9_1 was missing, which means 2678 missingness was due to participants didn’t report a viral test, thus they couldn’t see the question “Since you completed your last survey, were any of your viral (PCR or rapid) test(s) positive/reactive?”
Participants saw the question, but not answer it (skip the question) -99

This type could be easily identified if level of -99 exists.

However, this type of missingness is not necessarily coded in this variable. The pattern could be identified if level of -99 was in the embedded question response.

There are 11505 missing data for labdiag_v9 variable; 82 out of 11505 complete_v9 was equal to 1 and testtype_v9_1 was -99, which means 82 missingness was due to participants saw but didn’t answer the question “Did you receive a viral test or an antibody test”, thus they couldn’t see the question “Since you completed your last survey, were any of your viral (PCR or rapid) test(s) positive/reactive?”
Participant saw and answered the question, they chose the level of “Not applicable” 97

This type could be easily identified if level of 97 exists.

However, this type of missingness is not necessarily coded in this variable. The pattern could be identified if level of 97 was in the embedded question response.

Question “In the past month, have you experienced a significant personal loss of income as a result of COVID-19?” There is a level of 97 in the answers. Participants could choose 97 to indicate the question is not applicable for them. Analyst could make your own decision how to handle this pattern.