The analysis of United States survey data with Python

The analysis of United States survey data with Python

$25.00

Learn how to use the American Community Survey (ACS) and the Annual Social and Economic Supplement (ASEC) published by the Census Bureau and the Bureau of Labor Statistics.

Credit card and PayPal payments accepted.

Add To Cart

This is a guided tour of the Annual Social Economic Supplement (ASEC) and the American Community Survey (ACS) datasets using Python for transformations, statistics, and graphics. The datasets are published by the Bureau of Labor Statistics and the Census Bureau.

New second edition with two supplements:

  • Combining ASEC and ACS datasets to obtain joint estimates.

  • Processing ACS 5-year datasets on Amazon Web Services.

See the descriptions at the bottom of the page.

The ASEC and ACS are survey datasets published annually by the Bureau of Labor Statistics and the Census Bureau, respectively. A typical ASEC dataset contains about 75,000 households, about 80,000 families, and between 150,000 and 200,000 people. A yearly ACS dataset typically includes about 750,000 household records and 3,00,000 person records.

These datasets provide key information to business, policymakers, and the media. The ASEC and ACS include data on person, family, and household demographics, work status, income, educational attainment, health coverage, occupation, industry, geography, migration, ethnicity, citizenship, and ancestry.

The python package is a self-contained and ready-to-use environment for processing and analyzing the ASEC and ACS datasets. The package includes column definitions and code tables developed from the ASEC and ACS data dictionaries for the years 2000 through 2022. Links to formatted datasets are provided to make project configuration as easy as possible. The transformation scripts included in the package can be applied to future releases of the datasets. The Python package includes methods for transformations, standard errors, and advanced graphics for visualizing results. It can be used to jump-start any ASEC/ACS project. Results from the package are verified against Census-published estimates and Stata.

The textbook is a guided analysis of the social and economic topics covered by the surveys. There are 35 examples that cover key topics with report-ready graphics. The technical tasks associated with these survey datasets are clearly explained and demonstrated. The example scripts are templates that can be adapted to ASEC and ACS topics not specifically covered by the examples.

Individuals can use this package to get up to speed with the ASEC and the ACS using Python. The package was written for those who will use the ASEC and ACS in their work but have no little or no experience with these datasets. The package focuses on the 1-year ASEC and ACS datasets, but the concepts and techniques can be applied to ACS 5-year estimates and to data obtained from open-source APIs such as census and censusdis.

The techniques required for effective analysis of the ACS and ASEC are clearly demonstrated:

  • Processing public use files and generating labelled datasets using the published column definitions and code tables.

  • Parallel processing for the ACS datasets.

  • Using Census and BLS estimates published from production files.

  • Standard errors using replicate weights, including the ACS Variant Replicate Tables.

  • Standard errors using Generalized Variance Functions.

  • Dealing with dollar-denominated fields.

  • Using geographic codes.

  • 35+ examples visualize results for ASEC and ACS topics, including demographics, geography, labor force characteristics, earnings, health care coverage, migration, commute times, and other topics.

  • Re-useable methods for standard errors, data analysis, and visualization that have been tested for 20 years of data and verified against Stata.

The package gives the researcher full control over the data and analysis.

There are two new supplements:

  • Combining ASEC and ACS datasets to obtain joint estimates. There is significant overlap in the topics covered by The Annual Social and Economic Supplement (ASEC) and the American Community Survey (ACS). Both datasets include fields for basic demographics, education, health insurance, earnings, labor force status, geography, occupation, industry, and other topics. The ASEC and ACS are independent surveys. It is a rare event that a household participates in both surveys. Taken together, these two datasets represent a very large survey sample. The technical supplement extends the techniques presented in the main text to obtain joint estimates.

  • Processing ACS 5-year datasets on Amazon Web Services. The American Community Survey 5-year datasets are too large to process on personal hardware. The person and household datasets are each published as four files denoted 'a', 'b', 'c', and 'd'. The household files are about 1 gig each and the person files are 2-3 gig each. This supplement demonstrates how to extend the techniques presented in the main text for processing 5-year ACS datasets on Amazon Web Services.

Also see the tutorials and articles posted here.

Send questions or comments to talk@mlbridgeresearch.com or contact me on LinkedIn: www.linkedin.com/in/thomasmckennon59983316a.

Date Published: 8.2023
Pages: 200
Version: v2.7