An API request requires variable codes of the selected variables and ucgid (geographic code). This notebook is used to retrieve variable codes in ACS tables.

The U.S. Census Bureau provides variable metadata in JSON format. These JSON format documentations were downloaded and saved as .json files in project folder:

  • b_variables.json was downloaded from Detailed Tables Variables.
  • s_variables.json was downloaded from Subject Tables Variables.
  • dp_variables.json was downloaded from Data Profiles Variables.

These 3 files are used to look up variable codes based on table IDs and variable labels.

In [1]:
import json
import pandas as pd
In [2]:
# Load the feature metadata for Detailed Tables
with open ('b_variables.json', 'r') as file:
    b_variables = json.load(file)
b_var = b_variables['variables']
b_var = pd.DataFrame(b_var).transpose().reset_index()
b_var.rename(columns={'index':'var_code'}, inplace=True)
b_var.tail()
Out[2]:
var_code label concept predicateType group limit predicateOnly hasGeoCollectionSupport attributes required
36719 B20005I_071E Estimate!!Total:!!Female:!!Worked full-time, y... Sex by Work Experience in the Past 12 Months b... int B20005I 0 NaN NaN B20005I_071EA,B20005I_071M,B20005I_071MA NaN
36720 B08113_054E Estimate!!Total:!!Worked from home:!!Speak oth... Means of Transportation to Work by Language Sp... int B08113 0 NaN NaN B08113_054EA,B08113_054M,B08113_054MA NaN
36721 B06009_006E Estimate!!Total:!!Graduate or professional degree Place of Birth by Educational Attainment in th... int B06009 0 NaN NaN B06009_006EA,B06009_006M,B06009_006MA NaN
36722 B05009_012E Estimate!!Total:!!Under 6 years:!!Living with ... Age and Nativity of Own Children Under 18 Year... int B05009 0 NaN NaN B05009_012EA,B05009_012M,B05009_012MA NaN
36723 B02021_001E Estimate!!Total Groups Tallied: American Indian and Alaska Native Alone or in ... int B02021 0 NaN NaN B02021_001EA,B02021_001M,B02021_001MA NaN
In [3]:
# Load the feature metadata for Subject Tables 
with open ('s_variables.json', 'r') as file:
    s_variables = json.load(file)
s_var = s_variables['variables']
s_var = pd.DataFrame(s_var).transpose().reset_index()
s_var.rename(columns={'index':'var_code'}, inplace=True)
s_var.tail()
Out[3]:
var_code label concept predicateType group limit predicateOnly hasGeoCollectionSupport attributes required
18642 S2402_C02_035E Estimate!!Male!!Full-time, year-round civilian... Occupation by Sex for the Full-Time, Year-Roun... int S2402 0 NaN NaN S2402_C02_035EA,S2402_C02_035M,S2402_C02_035MA NaN
18643 S1002_C04_004E Estimate!!60 years and over!!Percent distribut... Grandparents float S1002 0 NaN NaN S1002_C04_004EA,S1002_C04_004M,S1002_C04_004MA NaN
18644 S0601_C02_009E Estimate!!Native; born in state of residence!!... Selected Characteristics of the Total and Nati... float S0601 0 NaN NaN S0601_C02_009EA,S0601_C02_009M,S0601_C02_009MA NaN
18645 S2411_C01_012E Estimate!!Median earnings (dollars)!!Civilian ... Occupation by Sex and Median Earnings in the P... int S2411 0 NaN NaN S2411_C01_012EA,S2411_C01_012M,S2411_C01_012MA NaN
18646 S2302_C04_014E Estimate!!Percent Families with own children u... Employment Characteristics of Families int S2302 0 NaN NaN S2302_C04_014EA,S2302_C04_014M,S2302_C04_014MA NaN
In [4]:
# Load the feature metadata for Data Profiles 
with open ('dp_variables.json', 'r') as file:
   dp_variables = json.load(file)
dp_var = dp_variables['variables']
dp_var = pd.DataFrame(dp_var).transpose().reset_index()
dp_var.rename(columns={'index':'var_code'}, inplace=True)
dp_var.tail()
Out[4]:
var_code label concept predicateType group limit predicateOnly hasGeoCollectionSupport attributes required
1384 DP02PR_0001PE Percent!!HOUSEHOLDS BY TYPE!!Total households Selected Social Characteristics in Puerto Rico int DP02PR 0 NaN NaN DP02PR_0001PEA,DP02PR_0001PM,DP02PR_0001PMA NaN
1385 DP03_0039PE Percent!!INDUSTRY!!Civilian employed populatio... Selected Economic Characteristics float DP03 0 NaN NaN DP03_0039PEA,DP03_0039PM,DP03_0039PMA NaN
1386 DP02_0098E Estimate!!YEAR OF ENTRY!!Population born outsi... Selected Social Characteristics in the United ... int DP02 0 NaN NaN DP02_0098EA,DP02_0098M,DP02_0098MA NaN
1387 DP04_0095PE Percent!!SELECTED MONTHLY OWNER COSTS (SMOC)!!... Selected Housing Characteristics float DP04 0 NaN NaN DP04_0095PEA,DP04_0095PM,DP04_0095PMA NaN
1388 DP02_0036PE Percent!!MARITAL STATUS!!Females 15 years and ... Selected Social Characteristics in the United ... float DP02 0 NaN NaN DP02_0036PEA,DP02_0036PM,DP02_0036PMA NaN

Variables in this project are gathered from 3 different types of tables: Subject Tables, Detailed Tables, Data Profiles.

A variable code starts with a table ID with the first letter representing the table type: "S" for Subject Tables, "B" for Detailed Tables, "DP" for Data Profiles. For examples: S1701_C03_001E, DP03_0009PE, etc.

Relevant variables are selected by browsing tables and variable labels on https://data.census.gov/.

To retrieve variable code of a selected variable, run the below code and enter table ID, column names, and labels as prompted (case insensitive; leading and trailing spaces are accepted).

Example: to get variable code for poverty rate:

  • Input:
    Table ID: s1701
    Parent column: percent below poverty level
    Child column: estimate
    Parent label: Population for whom poverty status is determined
    Child label 1: Press Enter to skip
    Child label 2: Press Enter to skip
    Child label 3: Press Enter to skip
  • Output:
    Table: S1701
    Variable label: Estimate!!Percent below poverty level!!Population for whom poverty status is determined
    Variable code: S1701_C03_001E

My Local GIF

In [5]:
def input_prompt ():
    '''Prompt user to input table ID and label's components'''
    table = input('Table ID:')
    parent_col = input('Parent column:')
    child_col = input('Child column:')
    parent_label = input('Parent label:')
    child_label_1 = input('Child label 1:')
    child_label_2 = input('Child label 2:')
    child_label_3 = input('Child label 3:')
    
    # A list of label's components
    components = [child_col, parent_col, parent_label, child_label_1, child_label_2, child_label_3]
    
    return table, components


def get_var_label (components):
    ''' Return variable's label by assembling user's inputs'''
    label = []
    for i in components:
        # Filter out empty elements
        if i != '':
            label.append(i.strip())
    
    if len(label) <= 1:     # A label should have at least 2 components
        var_label = None
    else:
        var_label = '!!'.join(label)    # Components are assembled and separated by "!!"
    
    return var_label


def get_var_id (table, components):
    '''Call get_var_label function to retrieve variable label.
        Validate the format of table ID and variable label.
        If the table ID and variable label are valid, look up variable code based on table ID and variable label.'''
    
    var_label = get_var_label(components)
    
    if not table or not table.lower().startswith(('b','s','dp')) or var_label == None:
        raise ValueError('Invalid inputs!')
    else:
        if table.lower().startswith('b'):
            result = b_var.loc[(b_var['group'].str.contains(table, case=False)) & (b_var['label'].str.lower() == var_label.lower())]
        elif table.lower().startswith('s'):
            result = s_var.loc[(s_var['group'].str.contains(table, case=False)) & (s_var['label'].str.lower() == var_label.lower())]
        elif table.lower().startswith('dp'):
            result = dp_var.loc[(dp_var['group'].str.contains(table, case=False)) & (dp_var['label'].str.lower() == var_label.lower())]

        print(f'Table: {result['group'].values[0]}')
        print(f'Variable label: {result['label'].values[0]}')
        print(f'Variable code: {result['var_code'].values[0]}')


if __name__=='__main__':
    try:
        table, components = input_prompt()
        get_var_id(table, components)
    except ValueError as excpt:
        print(excpt)
    except:
        print('Variable not found! No matching table or label.')
Table: S1701
Variable label: Estimate!!Percent below poverty level!!Population for whom poverty status is determined
Variable code: S1701_C03_001E

References¶

  • More information about ACS table types and variables: LINK
  • The get_var_id function is adapted from a Stack Overflow thread: LINK
In [ ]: