The Tsinghua-UChicago Joint Research Center for Economics and Finance recognizes the importance of providing researchers with access to the tools needed to perform ground-breaking research. The following data, purchased with Joint Center grants, will be available to University of Chicago and Tsinghua University scholars in economics.
Tsinghua China Data Center (CDC)
Tsinghua China Data Center (CDC) was jointly established by the National Bureau of Statistics and Tsinghua University in July 2016. With the governmental statistical advantages of the National Bureau of Statistics and the academic research advantages of Tsinghua University, CDC aims to build a first-class economic and social data development and research team.
On June 26, 2018, the National Bureau of Statistics officially launched its first pilot work on the development and application of microdata at CDC. CDC is committed to promoting the in-depth development and application of microdata of the Chinese government in academic research and to promoting the production of more original and high-level research outputs. The microdata currently developed and applied include: annual microdata on financial situation of industrial enterprises above designated size in 2012-2016, microdata of household income and expenditure in 2005/2008/2010/2013, microdata of household income and expenditure and living conditions in 2013, microdata of population census in 2000/2010, microdata of 1% population sampling survey in 2015, microdata of the third national economic census, microdata of the third national agricultural census, and microdata of enterprise tracking survey in 2014-2016.
- Annual microdata on the financial performance of industrial enterprises above designated size
The survey data on the financial performance of industrial enterprises above designated size include the annual data of Beijing, Shanghai and Zhejiang provinces from 2012 to 2016. Among the industrial enterprises above designated size in the three provinces, the sample enterprises are randomly selected by the proportion of 10%, and the enterprise data are desensitized. The data consists of three major parts, balance sheets, income statements and others, mainly including 32 variables such as total assets, fixed assets, current assets, total liabilities, current liabilities, paid-in capital, main business income, main business cost, operating profits, total profits and number of employees.
- Microdata on household income and expenditure
The microdata on household income and expenditure covers 10,000 urban and rural households in terms of their income and expenditure in 2005, 2008, 2010 and 2013 respectively, totaling 80,000 households. Microdata have been anonymized and only provincial information is retained. Variables cover the information of households including basic situation, per capita income, per capita consumption, basic facilities, etc., with 40 variables for cities and towns and 32 variables for rural areas.
- Microdata on household income and expenditure, and living conditions
The micro database of the 2013 household income and expenditure survey is the household-level sample data of the national household income and expenditure as well as living conditions. The survey of national household income and expenditure as well as living conditions comprehensively adopts the method of stratification, multi-stage, proportional to population size (PPS) and random equidistant sampling to select village-level organizations and households. A total of 160,000 households have been selected nationwide, covering about 1,800 counties.
This database is a sub-sample obtained by systematic sampling of the sample households in the data of the 2013 household income and expenditure as well as living conditions survey, including a total of 20,000 households. There are 37 variables for each household, covering per capita disposable income, per capita consumption expenditure, durable goods ownership, etc.
In order to prevent the disclosure of personal information, the database has been anonymized by deleting geographic information that can identify households and retaining only provincial information.
- Census microdata
The micro database of population census is the individual-level sample data of China’s population census in 2000 and 2010. China’s population census adopts two types of census forms, long and short. The short form includes items that reflect the basic situation of the population, while the long form includes all items in the short form and items on the population’s economic activities, marriage and family, childbirth and housing. The long form was filled in by 10% of the households, while the short form was filled in by the rest of the households. This database is obtained by systematic sampling of the long form data of the census form, with a sampling ratio of 0.995%, accounting for 0.95 ‰ of the total population (excluding people in active military service and the population whose permanent residence is difficult to identify). The main structural variables of the data are quite representative of the whole and can meet the needs of most academic researches.
The data covers relevant variables such as gender, age, ethnic group, education level, industry, occupation, migration, social security, marriage, childbirth, death, housing, etc.
In order to prevent the disclosure of personal information, the database has been anonymized.
- Microdata of population sampling survey
This micro database is the individual-level sample data of China’s 1% population sampling survey in 2015. The survey uses the method of stratified, two-stage, probability proportion and cluster sampling to select 89,147 survey blocks out of 2,977 counties, 33,671 towns, and 85,365 villages from 31 provinces. A total of 21.31 million permanent residents have been surveyed, accounting for 1.55% of the country’s total population.
This database is obtained by systematic sampling of 1% of the population sample survey data in 2015, accounting for 1 ‰ of the total population (excluding people in active military service and the population whose permanent residence is difficult to identify). The database covers 43,2447 household records and 1,371,252 person records.
In order to facilitate users to use the data, the database provides weight variables (household weights and person weights). After weighted processing, the data of various regions have been converted according to the national unified sampling ratio and can be directly compared.
The database uses the caliber of permanent resident population, and the data variables cover 60 related variables such as name, sex, age, ethnic group, household registration, education level, industry, occupation, migration, social security, marriage, childbirth, death, housing, etc.
In order to prevent the disclosure of personal information, the database has been anonymized.
- Microdata on national economic census
The microdata of the third national economic census are obtained by sampling all the enterprises and legal entities in the third national economic census, excluding financial and railway entities, and are sampled according to the proportion of 10%. In order to prevent the disclosure of personal information, the database has been anonymized. The data consists of three parts: non-industrial enterprises, industrial enterprises below designated size and industrial enterprises above designated size.
- Non-industrial enterprises
- Variables of non-industrial enterprises include industry code, location code, establishing time, number of employees, type of registration, equity structure, business status, total income, main business income, total assets, total R&D personnel, ratio of R&D personnel to total employees, R&D expenditure, and number of R&D projects.
- Industrial enterprises below designated size
- Industrial enterprises below designated size refer to industrial legal person enterprises with annual main business income of less than 20 million RMB. Variables include industry category codes, total assets, paid-in capital, main business income, and the number of employees at the end of the period.
- Industrial enterprises above designated size
- Industrial enterprises above designated size refer to industrial legal person enterprises with annual main business income of 20 million RMB or more.
- Microdata on the national agricultural census
The microdata on the third national agricultural census include the census of agricultural business organizations, the census of agricultural business households, the census of farmers, the census of administrative villages and the census of villages and towns.
- Agricultural business organizations census
- This database consists of observations randomly selected from the census data of more than 2 million agricultural business entities registered in the third national agricultural census, including 42.06 million observations of about 20,000 agricultural business entities.
- The sample contains the basic information of agricultural business entities, employment of personnel engaged in agriculture, forestry, animal husbandry and fishery, and corresponding service business, farmland owned or operated and related land rights circulation, crop planting, forest land confirmed or operated and related forest land rights circulation, livestock or poultry raising, grassland confirmed or operated, aquatic product breeding or fishing, agricultural machinery owned, and the characteristics of agriculture, forestry, animal husbandry and fishery production and operation.
- Agricultural business household census
- This database consists of observations randomly selected from the census data of nearly 4 million agricultural business households registered in the third national agricultural census, including 320 million census data of about 40,000 agricultural business households with designated size.
- The sample includes the basic situation of agricultural business households with designated size, housing and living conditions, employment of personnel engaged in agriculture, forestry, animal husbandry and fishery, and corresponding service business, farmland confirmed or operated and related land rights circulation, crop planting, forest land confirmed or operated and related forest land rights circulation, livestock or poultry raising, grassland confirmed or operated, aquatic product breeding or fishing, agricultural machinery owned, and the characteristics of agriculture, forestry, animal husbandry and fishery production and operation.
- Agricultural household census
- This database consists of observations randomly selected from the census data of about 230 million agricultural households registered in the third national agricultural census, including 960 million census data of about 230,000 households.
- The sample includes the basic situation of agricultural households, housing and living conditions, farmland confirmed or operated, crop planting, forest land confirmed or operated, livestock or poultry raising, grassland confirmed or operated, aquatic product breeding or fishing, agricultural machinery owned, and the characteristics of agriculture, forestry, animal husbandry and fishery production and operation.
- Administrative village census
- This database consists of observations randomly selected from the census data of about 600,000 administrative villages nationwide, including about 60,000 village-level data.
- This data sample includes basic attributes and infrastructure variables such as administrative village types, topography, national characteristic landscape tourism villages, infrastructure allocation, etc. The data cover the basic situation of administrative villages, year-end population, social security, basic social services, land management and circulation, farmland water conservancy, characteristic farming industry, livestock and poultry centralized farming association, village collective economic organization finance, village cadres, etc.
- Township census
- This database consists of observations randomly selected from the census data of about 40,000 township-level entities nationwide, including about 4,000 township-level data.
- This data sample includes basic attribute variables such as township type, township attribute, topography and geomorphology, and covers variables such as basic situation of township entities, transportation facilities, population, economy, trade market, education, culture and health, living security, public utilities, etc.
- All the above microdata has been desensitized.
- Microdata of enterprise survey
- Data sources
- Since October 2014, the National Bureau of Statistics, together with the former State Administration for Industry and Commerce, has taken sample entities out of small and micro enterprises and self-employed ventures newly registered in the administrative department after the implementation of the reform of the industrial and commercial registration system, to carry out quarterly follow-up investigations. This micro database comes from survey data from the third quarter of 2014 to the fourth quarter of 2016.
- This database consists of three major parts: the basic situation, the fundamental performance and questionnaire variables of the entity. It mainly covers nine variables: entity ID, industry code, total assets, operating income, number of employees, operating situation, employment, preferential policies enjoyed, and the most concerned policies.
Hao, T. , Sun, R. , Tombe, T. , & Zhu, X. . (2020). The effect of migration policy on growth, structural change, and regional inequality in china. Journal of Monetary Economics.
Official Website: http://www.tcdc.sem.tsinghua.edu.cn/
Address: Room 224, Shunde building, Tsinghua University
Beijing Daokou Fintech Technology Co., Ltd.
Beijing Daokou Fintech Technology Co., Ltd. (hereinafter referred to as “Daokou Fintech Technology”) is a fintech platform incubated by the Tsinghua PBC School of Finance Fintech Lab. Established in 2018, Daokou Fintech Technology focuses on collecting enterprise big data. Relying on data analysis technology and artificial intelligence algorithms, Daokou Fintech Technology has constructed the cognitive atlas from the enterprise panoramic data to the industrial chain data, being applied in scenarios such as risk management, accurate marketing, and research analysis. In March 2019, one of the Company’s main products, “Xinghe Big Data Service” (hereinafter referred to as “Xinghe Platform”), was officially launched. Xinghe Platform has collected 180 million enterprise multi-dimensional data, nearly 1,000 industrial chain knowledge atlases and enterprise analysis reports.
Xinghe Platform developed by Daokou Fintech Technology has gradually integrated government public data, data on third-party data platforms, and its own data such as policy interpretation, industry research and Internet public opinion. At present, the platform has covered more than 180 million enterprises in more than 600 dimensions. Data on Xinghe Big Data platform mainly includes the following aspects:
- Enterprise Registration Information
- Basic information: unified social credit code, taxpayer identification number, date of establishment, industry, business address, business type, registration status, registration modification, number of employees, high executives, branches, etc.;
- Equity information: shareholder information (shareholders, holding amount, holding percentage), equity penetration map (up to 10 levels available), final beneficiary information, actual controller information, etc.;
- Annual report: modification information, shareholder information, basic information of the company, financial information published by the company in the annual report;
- Social security information: including the number of employees with social security and the payment base of the five insurances, etc.;
- Investment: industries invested by the company, the number and details of subsidiaries invested by the company, etc.
- Legal Risks
- Judgement documents: list of judgement documents (including heading, case number, cause of action, date of submission, etc.), details of judgement documents (including raw files of the judgement documents);
- Court announcement: court order announcement, case filing information, etc.;
- Execution again discredit: information of the executed entity, court of execution enforcement, date, etc.;
- Others: including judicial auction, judicial aid, bribery violations, etc.
- Operational Risks
- Abnormal operation: reasons for being listed of abnormal operation, regulatory authority, listing time, removal time, reasons for removal from the list of abnormal operation, etc.;
- Business administrative penalty: execution authority, publishing time, penalty content, etc.;
- Tax risks: tax arrears information, major tax violations, tax credit information, etc.;
- Serious violation of the law and discredit: the name of the publishing authorities, publishing time and reason;
- Equity pledge: the pledged share of equity, pledgor, pledgee, etc.;
- Mortgage of movable property: types of movable property, mortgage amount, etc.;
- Equity Freeze: the status and date of freezing shares, the executed court, details, etc.
- Innovation Information
- Patent information: name, type, application date, application number, publishing date, patent inventor, classification number, detailed information, applicant, agent, etc.;
- Trademark information: trademark list, registration number, classification code, classification name, application date, exclusive right period, trademark details, trademark picture link, company name, latest update time, applicant address, priority date, agent, etc.;
- Copyright information: holding company, version number, type, name, completion date, etc.;
- Software copyright information: registration number, holding company, version number, type, name, etc.;
- Domain name information: homepage URL, filing license number, website name, nature of organizer, etc.;
- Other innovation output information: Undertaking government scientific research projects, leading or participating in the development of standards, whether in a strategic emerging industry, participating in scientific and technological reports funded by national scientific and technological projects, obtaining scientific and technological qualifications, etc.;
- R&D information: R&D expenditure, R&D personnel, laboratories, research workstations, talent introduction, etc.;
- Other: Exhibition information (information about the company participating in the exhibition, date, location, level, exhibition name, etc.).
- Business operations
- Administrative license: administrative licenses obtained by the enterprise, the issuing organization, the type of license, the issuing date, etc.;
- Sales analyses based on invoice information: monthly average invoicing, invoicing interval, invoicing fluctuation, names of main commodities, types of commodities sold, distribution of sales areas, distribution of commodity tax rates, upstream and downstream analyses, related party transactions, etc.;
- Bidding information: the types and contents of bidding obtained by the company;
- Customs information on import and export: the issuing authorities and dates of import and export licenses, etc.;
- Land information: land mortgage, land purchase, land transaction, etc.;
- Recruitment information: company name, position, salary, job description, working experiences and educational background requirements, working location, announcement date, source, etc.;
- Public sentiment information: news about the company, including heading, publish date, introduction, source, original link, public sentiment label, etc.
- Enterprise Analysis Report (Generated by Daokou Fintech Technology)
- Basic report: based on the information of administration, judicial risks, intellectual property rights, etc.;
- Operation report: combining the invoice information and information of administration, judicial risks, intellectual property rights, administrative penalty to analyze the company’s operating production capacity, profitability and upstream and downstream situation;
- Credit risk report: to quantify the credit of company according to the information of administration, judicial risks, tax, financial situation, etc., and provide the results of company risk quantitative evaluation in the form of report.
Center on Data and Governance, Tsinghua University
Established in May 2015, Center on Data and Governance (CDG) is a research institution in Tsinghua University that integrates data science and social science. Relying on the interdisciplinary integration of social science, data science and computer science, CDG aims to comprehensively promote the collection, analysis, and application of government open data, economic data, and social media data. In June 2020, the School of Social Science (SSC) in Tsinghua University officially approved to upgrade CDG to an affiliated research institution of the School.
CDG is equipped with one database and two research platforms to serve research projects related to politics and economics, and provides systematic support to researchers for their teaching and scientific research.
China’s Government Administration Database
Up to now, the data developed and applied in this database include:
- China Judicial Database
- It covers China’s judicial documents of criminal cases, civil cases, administrative cases, compensation cases, enforcement cases, etc.
- Firm Registration Database
- This database comes from China’s State Administration for Industry and Commerce. It covers relevant information on the business registration of Chinese enterprises, including registration date, registered capital, location, industry, status of the firm (either existing or bankrupt), shareholder, alter record, investment, key personnel, annual business report, etc.
- Database of China’s Policy and Regulatory Documents
- It covers policies and regulations of all levels of government agencies in China, including ministries, provinces, cities, counties, etc.
- Government Response Database
- It collects government response data from multiple sources, including social medias, government hotlines, online government inquiry platforms, and government official websites.
- Anti-corruption Database
- China Social Governance Survey 2015
- The China Social Governance Survey 2015 (CSGS2015) is one of the highest quality survey data available on the political attitude of the Chinese public. The questionnaire of this survey refers to the existing research literature and is designed according to China’s national conditions. It inquires the public of their perception about the confidence in the political system, interpersonal trust, values tendency, and the overall governance of China (including five dimensions, political participation, the National People’s Congress (NPC) supervision, government response, political freedom, and corruption).
- Survey data on Urban Governance in China in 2015 and 2018
- Surveys data on Local Party and Government Cadres in China from 2013 to 2017
- Surveys data on Deliberative Democracy in 2015, 2017 and 2019
Government Big Data Platform
The Government Big Data Platform focuses on government big data. This platform provides researchers with collection and analysis services of government data from various sources, including government websites, online government inquiry platforms, government hotlines, laws and policies documents, etc.
Social Governance Big Data Platform
The Social Governance Big Data Platform aims to provide data services for researchers to integrate theoretical analysis with qualitative, quantitative and experimental methods to perform high-level research. The platform has implemented questionnaires and experimental research on a long-term basis, and provided researchers with supports such as programming and data analysis.
WeChat Channel: PoliticalScience-THU