Industries Needs: Data Management Life Cycle

Introduction

Transportation inefficiencies cost money, reduce safety, increase pollution-causing emissions, and take time away from people’s lives. The solution is not always to build more roads, create parking spaces, or add more bus routes. Sometimes, the better solution is to do more with the infrastructure we already have, and for that, you need information on which to base decisions.

Data are raw material representing actions or transactions in the real world that are recorded, classified, processed, stored, and potentially repurposed to create information that supports policy and decision making. The end user interprets the meaning to draw conclusions and identify implications of the information (1). In transportation, decision-makers use data to assess alternatives, weigh tradeoffs, and to evaluate performance. Stakeholders use data to assess the comprehensive performance of a transportation organization. The public uses data to inform their personal decisions and travel behavior.

Transportation data are a key component for policy research and performance management. Examples of data that reflect the wide range of data sources used for transportation purposes include the following:

· Crash records that reveal incident location and contributing factors.

· Probe speed and volume data to inform congestion mitigation and management efforts.

· Census data to show demographic and socioeconomic characteristics, population distribution, and change.

· Roadway inventory to estimate the supply and demand of infrastructures.

· Travel behavior data to identify patterns and trends.

· Public opinion data to reflect attitudes and awareness of transportation issues.

· Road weather information data to alert travelers to roadway conditions and traffic operations.

The volume of transportation data expands continually. Technological advances are happening at a rapid pace, generating large amounts of data that appear to be valuable in understanding the issues that form transportation policy. As data continues to expand, it is important for policy makers to know the value of data and the return on investment for collecting and analyzing data.

The importance of data in this era of data-driven decision making, the swift increase in the volume of data due to improved collection methods, new uses such as automated and connected vehicles, and increased interest on the part of the public in factors underlying decision making, suggests that policymakers may have an interest in understanding and addressing the quantity, quality, creation, collection, storage, retention, privacy, security, and availability of transportation data across agencies.

This paper attempts to bring clarity to the topic of data—to simplify and organize it into something that is digestible. By better understanding the data landscape as a whole, policy makers can better understand the role of each piece of data as it relates to transportation, as well as in other areas. This report provides a roadmap of data management to be used for high-level prioritization for future research efforts.

The report is organized as follows:

· Data Management Life Cycle. This section describes the process used to categorize data topics and develop the data management life cycle, as well as introduces the components of the data management life cycle.

· Data Management Life Cycle Phases. This section describes each of the eight phases in the data management life cycle in detail.

· Cross-cutting Issues in Data Management. This section describes eight issues that cut across all phases of data management.

· Summary. This section summarizes the data life cycle and provides suggestions for future· research efforts.

Data Management Life Cycle

Accurate, timely data is an important input for making accurate, timely transportation planning and policy decisions. However, the management of data is challenging and must be addressed over the life span of a piece of data. Transportation agencies already manage many of their physical assets: roads, bridges, signs, lights, etc. Data can be treated like other physical assets. Data is a key component in decision-making, so it is important to also carefully manage and maintain data to know what data exists, where it is located, how it can be obtained, and if it is accurate. Furthermore, data are often expensive to procure, so one would want to make sure the right data are available to support key decisions.

Data as a topic is so broad; it can be overwhelming and difficult to grasp all the elements it encompasses. Through a cyclical and iterative process, researchers at TTI identified possible aspects and uses of data in the transportation context and developed a framework of what data exists, and then condensed the topics into cross-cutting issues and main themes in the data management life cycle. This life cycle presents a way to organize data, characterize its nature and value over time, and identify policy implications of cross-cutting data management issues.

Illustrated in Figure 1, the data management life cycle describes key aspects of data from creation to destruction, as well as cross-cutting issues that affect data in each phase of the life cycle. Data moves through seven phases in its life cycle:

· Collect.

· Process.

· Store and secure.

· Use.

· Share and communicate.

· Archive.

· Destroy or re-use (concurrent phases).

Researchers at TTI also identified seven cross-cutting issues in the data management lifecycle, which occur and can change over the life cycle, but affect each of the seven life cycle phases (Figure 1). Some cross-cutting issues are pivotal to each life cycle phase, and all have policy implications. The cross-cutting issues are:

· Purpose and value.

· Privacy.

· Data ownership.

· Liability.

· Public perception.

· Security.

· Standards and Data Quality.

Data Management Life Cycle Phases

The stages of the data management life cycle—collect, process, store and secure, use, share and communicate, archive, reuse/repurpose, and destroy—are described in this section.

Collect

The first phase of the data management life cycle is data collection. Data is being collected for a myriad of reasons, such as operations, maintenance, planning, performance measures, or to address a certain policy goal or objective. The key factors in this stage are:

·Techniques and methods for collection.

· Public versus private sector data generation, procurement, and partnerships for data collection.

· Impact of technology and big data.

Techniques and Methods for Data Collection

Transportation data relate to people, vehicles, assets, physical infrastructure, and travel. Users of the information derived from the data are key stakeholders in the data collection and analysis process. Depending on the needs of the user, the data collection type and methodology vary at different geographic and jurisdictional levels. Data collection systems should be designed to meet both internal and external user needs and the agency’s legislative mandates. The planning and design of data collection system includes establishing data needs and objectives, identifying data providers, planning and designing methods to meet data needs and objectives, and documenting data collection and designs (2).

Data collection methods should be determined based on factors such as funding availability, data quantity, length of collection period, research questions, and target populations. Future research should be focused on examining ways that public agencies can harness big data from private entities.

Partnerships for Data Collection

Data collection can be challenging for transportation agencies with limited time, resources, and technology. The process of identifying and collecting accurate and useful data requires technical expertise and well-developed tools. A public-private partnership in this case could help to facilitate data collection and enhance agencies’ ability to be data-driven development practitioners and decision makers. Currently, Texas’ public-private partnership mostly focuses on the State’s facilities and infrastructure projects. There is a lack of formal guidance on potential collaboration of data collection. Before entering a public-private partnership, it is critical to be aware of existing data ownership policies and clearly describe rights and obligations so data integrity is not compromised.

There are multiple ways vehicle data is collected by public and private sector sources. In the public sector, sensors on roads put in place by local and state DOTs collect vehicle speed and volume data that is not associated with the personal identity of the vehicle owner. In the private sector, individual vehicle telematics data is obtained via cellular backhaul transmissions by telecommunications companies who have agreements in place to route the data to automotive manufacturers who then use it for various purposes.

Impact of Technology and Big Data

Collection and exploitation of large data sets for transportation operations, planning, and safety purposes is not new; in the past data were acquired, processed, and discarded. Now with low-cost and widespread sensing across all modes and types of infrastructure, they are acquired, processed, and stored for some later currently unknown use.

It is important to understand what data have been generated and how to use them to shape the future of transportation in Texas. Millions of devices have been equipped with Internet of things (IoT) technology. The IoT refers to “the network of physical objects or “things” embedded with electronics, software, sensors, and network connectivity, which enables these objects to collect and exchange data” (3). Application of the IoT extends to all aspects of transportation systems, (i.e., the vehicle, the infrastructure, and the driver or user). It automates data collection and generates a massive pool of data (Big Data) from diverse locations that is aggregated very quickly. For example, Google has crowd sourced the collection of real time traffic data via mobile phones. If the Google Maps app is installed on a mobile phone with GPS capabilities enabled, Google can collect the location and travel data of the phone user in real time. When Google combines the speed collected from all the phones on road, they are able to evaluate live traffic conditions and send it back to user for navigation.

Federal and state laws and rules place requirements on the collection of certain data related to various aspects of transportation. One example in the field of new transportation-related technologies are recently developed laws involving data collection requirements surrounding automated license plate reader systems (ALPR) mounted on police cars, road signs, and traffic lights that capture geo-located and temporal data aligned with PII data from these systems. Given the PII, data collection requirements have been created for ALPRs across various states. Table 1 describes some of these laws (4).

Data sets, often referred to as Big Data, of this magnitude and complexity are proliferating in part because data is increasingly being continuously gathered by ubiquitous information-sensing mobile devices, GPS devices, remote sensing technologies, software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks. Examples of Big Data sources in transportation research include probe data, GPS data, Bluetooth sensors, mobile devices, and cameras.

Process

Data processing is the second phase of the data management life cycle that takes a primary role in converting the data collected in the first stage of the life cycle to meaningful information. When data is collected, it may not be in a readily usable form. The process starts with discovering inconsistencies and other anomalies in the data into raw data, as well as data cleansing to improve the data quality. Users could then conduct analyses to produce meaningful information based on the data that may lead to a resolution of a problem or improvement of an existing situation. The key factors in this stage include:

· Data quality metrics.

· Quality assurance and quality control.

· Data processing techniques.

Data Quality Metrics

Data quality metrics identify data errors and erroneous data elements and measure the impact of various data-driven processes. A data quality assessment enables transportation agencies to understand the condition of their safety and traffic data, for example, in relation to expectations. It could assist agencies in understanding how effectively data represents the objects, events, and concepts it is designed to represent. AASHTO has developed seven core data principles to have consistency among states, listed in Figure 2.

Data Processing Techniques

Transportation agencies, research entities, and private companies are seeking to tap the information power within big data to create more effective decision making. It poses challenges to the traditional management and analysis, which lacks the capabilities to handle the complex data sources and amount of information. To extract and mine massive transportation data from various databases, it is important to understand and use advanced data processing techniques and tools. The Bureau of Transportation Statistics provides general instructions on data processing in the Guide to Good Statistical Practice in the Transportation Field (6). This guide includes principles and guidelines on data editing and coding, handling missing data, production of estimates and projections, and data analysis and interpretation.

Stakeholders can save time and increase capacity by using the advanced tools to enable more efficient and accurate real-time transportation data processing. For example, researchers at TTI have studied potential methodologies to realize the benefits from big data resources (7). One of the best alternatives is cloud computing. Cloud computing is described as, “a type of Internet[1]based computing that provides shared computer processing resources and data to computers and other devices on demand” (8). Alternatively, MapReduce is “a computation process that can process a large data set simultaneously utilizing multiple nodes (processors) in a cloud platform or in a local cluster environment.”

Technological advances allow for the generation of increasingly large amounts of data collected from information sensing devices such as smartphones, GPS devices, software logs, cameras, microphones, and other sensors. As the volume of data increases, transportation professionals need to have the technical skills and computer processing power to effectively use this robust data.

Store and Secure

The third phase of the data management life cycle is data storage and security. When data is secure and appropriately regulated, there is greater trust and confidence in its use. Data must be trustworthy and safeguarded from unauthorized access, whether malicious, fraudulent or erroneous. Transportation agencies at all levels of government (federal, state, and local) hold a wealth of diverse data sets, but it is often stored in different databases that are incompatible with each other or difficult to find.

The key factors in this stage include:

· Storage cost and maintenance.

· Storage and retention policies.

The global volume of electronically stored data is doubling every two years (9). The rapid growth in the volume of transportation data due to the innovation in data generation and collection leads to great demand of cost-effective storage technologies. More and more organizations are considering outsourcing storage services or cloud storage options because the availability of cloud computing resources opens up possibilities for users to transition to purchasing access to computing power and storage space as a service instead of maintaining it themselves. This way, providers are responsible for the performance, reliability, and scalability of the computing environment, while users can concentrate on data analysis and production (10).

Use

Data use is the fourth stage in the life cycle. Transportation data is used in numerous ways to study, plan, design, construct, operate, and monitor our transportation system. It helps planners understand traveler behavior and helps policymakers identify ways to make the system more efficient and cost-effective. It is also used to understand traveler behavior. These different uses are what make data an asset. The potential for infinite possible uses of data also creates challenges throughout the data life-cycle, from data collection to data destruction. How data can and will be used is dependent on how it is collected, processed, and stored.

A model of how data is used by departments of transportation in the United States to inform their activities, developed by Cambridge Systematics, is shown in the diagram in Figure 3 (11).

There are several issues to consider when reviewing data use for transportation purposes, including:

· Larger and more detailed data sources can create challenges for analytic capacity among researchers and processing tools, as well as challenges sharing data across an enterprise or with partners.

· As access and availability of data increase, users need to weigh this against their ability to process and interpret the data.

· Balancing valid data uses with security concerns about access to data.

· Privacy and proprietary restrictions on the use of collected data.

To address the transportation problems the state is currently facing, it is important to first determine the questions and the demand of information. For instance, in order to prioritize transportation funding and meet individual travel needs, it is important to understand travel behaviors and patterns. The U.S. DOT has been collecting traveler information across the nation through National Household Travel Survey since 1969. The data are used by Congress, policy makers at all level of government, and transportation planners to understand the performance of the current transportation system and develop strategic plans for the future. It has also contributed to improving safety, reducing congestion, tracking air quality improvements, and planning for future transportation investments (12). In Texas, TxDOT started a comprehensive travel survey program in the 1990s.

A Big Data Scan of the Texas A&M Transportation Institute in 2015 found that large or complex data sets are used by transportation researchers in topic areas such as mobility, safety and operations, operations and energy, and transportation modeling. However, the research also suggested that there were technical, institutional, and financial limitations on the capacity for researchers to explore new uses of data. Deployment strategies for organizations to capitalize on advancing data analytics include supporting collaboration with commercial data providers and private entities specializing in big data analytics, building internal capacity to leverage existing data sources, and offering data management as a service to clients and partners (13).

Share and Communicate

As transportation organizations work with more stakeholders and external partners to incorporate them into decision making, planning, and operations, there is an increased pressure to also share data. Shared data can help improve decisions since agencies/researchers will be able to obtain a more comprehensive picture of the impacts their decisions have based on contributions of new data sets from a wider variety of sources, both internally and externally. At the same time, shared data will also drive a decision maker to require more quality and clarity from data gathered, which will likely result in fewer sources of more accurate and timely managed data for decision[1]making.

Data sharing is the fifth stage of the data management life cycle. Open sharing of information and the release of information via relevant agreement must be balanced against the need to restrict the availability of classified, proprietary, and sensitive information. There are several issues to consider for sharing and communicating data, including:

· Communication and transparency.

· Coordination within the agency, with external partners, with private sector, with the public.

· Costs and maintenance of shared data.

· Interoperability across systems (tolling, connectivity, telematics).

· Access.

Communication and Transparency

Sharing public datasets is part of government efforts to communicate with the public, maintain transparency, and engage the public in decision-making processes in transportation. For example, various transportation-related data sets TxDOT utilizes in planning and decision-making are found on their web page titled “OneDOT Data Shop.” The site is an example of a state effort to provide information about its public data sets. It provides a set of basic identifying information about each data set, including the title, description, contact person, source, and update frequency.

Coordination

As transportation organizations partner more often with stakeholders and external partners in decision making, planning, and operations, there is an increased interest and need to share data. Furthermore, agencies are increasingly asked to “do more with less.” Transportation systems management and operations (TSMO) are a long-standing transportation activity in which transportation agencies collect roadway data to help manage congestion on roadways, improve incident response, and provide traffic information services. As computing technology improved, state and regional entities developed advanced traffic management systems (ATMS) that combine data from multiple public agencies and through contracts with the private sector to coordinate the transportation data networks of an entire region, across modes, jurisdictions, and organizations. Data-intensive TSMO activities often coordinated among these agencies include:

· Traffic flow performance monitoring.

· Incident detection and response.

· Traffic signal timing coordination.

· Integration of road weather information systems with the provision of traveler information.

This type of regional coordination adds complexity to the coordination of information technology system procurement and design, and how data is shared across multiple organizations, both public and private (14).

Costs and Maintenance of Shared Data

As data becomes increasingly available, data sharing can be a tool to combat rising costs for data storage, processing, and analysis and to identify cost-effective and efficient transportation solutions.

Access

Sharing data is a key step in reducing the burden on staff time as data becomes more accessible. Users must have access to the data critical to their duties and functions. Wide access to properly processed and packaged data can lead to efficiency and effectiveness in decision-making, and affords timely responses to information requests.

The benefits of sharing information and the release of information with public and private partner agreements must be balanced against the need to restrict the availability of classified, proprietary, and sensitive information. How transportation data is or is not shared has broad policy implications, particularly in cross-cutting areas such as data ownership, security, privacy, and liability. For example, the rapidly expanding presence in new vehicles of vehicle telematics systems that collect and transmit vehicle data present potential privacy risks for drivers (15).

Telematics systems incorporate numerous on-board communications, positioning, and computing technologies to provide services such as navigation, infotainment, remote diagnostics, and transmission of vehicle performance data for insurance purposes. At the same time, these products can collect and transmit vehicle data that is of value to public transportation agencies and to private entities with a commercial interest in developing and selling data associated with smart phones, GPS, and Bluetooth technologies in vehicles (16). Existing consumer protection and insurance policies may have to address the privacy issues raised by vehicle telematics and, in the future, highly automated and connected vehicles.

Archive

The sixth stage of data management is archiving. Data archiving is “the process of identifying and moving inactive data out of current production systems and into specialized long-term archival storage systems.” This serves two objectives: 1) moving inactive data out of active systems and databases to optimize current performance, and 2) storing inactive data in specialized archival systems that are more cost-effective and allow for retrieval when needed (17). A data archive may also be called a data bank or data center. There are several issues to consider when reviewing data archiving, including:

· Storage costs.

· IT needs.

· Cost/benefit.

· State and federal requirements to backup data.

· Other issues related to data backup.

Archiving is not a new concern for transportation data users, but the complexity and costs for data archiving are growing as data is collected faster and in larger amounts. Data archiving requires a variety of software, database, and electronic data storage technologies. It also requires staff to maintain systems, develop reports, and provide IT and administrative support. In 2011, the cost to operate and maintain one multiple agency data archive, with data fusion and visualization systems, was estimated to cost approximately $400,000 per year. Estimates for other statewide or regional data archive systems range from $300,000 to over $4 million, as costs can vary widely depending on the size and features of a system (18).

Transportation planners and policymakers are increasingly focused on data-driven decision[1]making and benchmarking for performance monitoring. Archived transportation data can enable better benchmarking and tracking of improvements to the transportation system over time. Crash records are an example of transportation data that is often archived and used for numerous purposes. Under the Texas Transportation Code, TxDOT is responsible for maintaining crash data submitted by Texas law enforcement officers. Since 2007, TxDOT has been developing Crash Records Information System (CRIS), the state repository for vehicle crash data, into a comprehensive electronic crash data system (19). In 2011, TxDOT launched the Crash Reporting and Analysis for Safer Highways (CRASH) internet application to speed up the transfer of crash data from law enforcement agencies to TxDOT by collecting reports electronically. Use of CRASH allows for faster and more efficient submission of data from the office or a patrol car. Quality of data entry is ensured through CRASH training, which is scheduled as part of the set[1]up process for each agency (20). In 2015, the retention period for Texas CRIS data was increased from 5 years to 10 years (21).

Data archiving plays a critical role in ongoing efforts to design and manage intelligent transportation systems (ITS) across the United States. ITS is “an operational system of various technologies that, when combined and managed, improve the operating capabilities of the overall system” (22). For ITS purposes, data archiving is defined as “the systematic retention and re-use of transportation data that is typically collected to fulfill real-time transportation operation and management needs” (23). ITS programs primarily focus on collecting real-time operational data that can be used for incident management, traffic signal control or travel information systems. In addition to providing more and better information for operations, this data can have other uses and avoid costly efforts to re-collect data for special studies (24). For decades, ITS programs have included efforts to support and expand the use of ITS data for transportation planning and other needs. In 1999, the Archived Data user Service (ADUS) was added to the National ITS Architecture, documentation that outlines how users should design and use ITS. ADUS was added to facilitate the use of ITS-generated data for multiple uses. Figure 4 demonstrates the various ways data from ITS sources can be used for many other transportation purposes.

ITS programs are expanding to accommodate emerging connected and automated vehicle and infrastructure technology (25). The U.S. DOT Intelligent Transportation Systems Joint Program Office’s ITS Strategic Plan 2015–2019 notes several focus areas related to data archiving, including the following:

· Enterprise data management focused on capturing, managing, and integrating “big data” from the range of ITS enabled technologies.

· Focus on ensuring interoperability within increasing complex technical systems by evolving standards and architectures to ensure that technological advancements are reflected and the required backward compatibility and interoperability are maintained.

Reuse/Repurpose or Destroy

At the end of data life cycle, data ultimately are either processed for reuse/repurpose, or destroyed when their utility has been exhausted. With data reuse and repurpose, the data management life cycle is no longer linear but has become circular. When data are appropriately handled, it can have a long life with many uses beyond its original one and serve projects yet to be planned. Data reuse refers to using the same data more than once for the same purpose; data repurpose means using the same datasets to serve a new purpose that is different from the original purpose of the datasets.

Data destruction refers to the process of removing information in a way that renders it unreadable (for paper records) or irretrievable (for digital records), so that it is completely unreadable and cannot be accessed or used for unauthorized purposes. Failure to do so can lead to serious breaches of data-protection and privacy policies, compliance problems, and storage issues.

Reuse/Repurpose

The repurposing of data enables the continuous extraction of value from data and leverages the data to solve new problems. This could also help to justify the expense of accumulating and managing huge volumes of data when organizations are monetizing or productizing their information assets. For instance, backup and archive data has represented the most comprehensive data set in many organizations. But this data is rarely used for any purposes other than restoring deleted, corrupted, or lost data. Mining the existing data for potential value creates the opportunity to turn some of the cost of backup into a resource.

There is no end in the data life cycle as far as data being continually reused and repurposed, creating new data products that may be processed, distributed, discovered, analyzed, and archived. The IoT generates a massive volume of data. For example, connected cars are equipped with more than 100 sensors creating a constant stream of data by measuring location, performance, physical parameters, and driving behavior, often several times per second. According to a 2015 Hitachi whitepaper, a single connected car will produce more than 25 GB of data per hour of use (26). These data can be analyzed in real-time to keep the vehicle’s performance, efficiency, and safety in check. It also provides vital feedback for cities and states about traffic volume and roadway design.

Experts say the value of vehicles will likely pale in comparison to the riches from our cars’ data (27). The data could be reused and repurposed for different goals. Car manufacturers can analyze vehicle operating performance, assess automotive telematics, and track performance of electrical components performance in different models. Vehicle owners can be notified of scheduled maintenance and repair requirements. Providing better, more proactive maintenance support using captured data may ultimately be a factor in what makes one type of vehicle more attractive to consumers. Additionally, insurance companies can potentially track speed and driving behavior in order to reward good drivers with lower premiums. Law enforcement may use connected car data to investigate accidents or to prosecute criminals.

The reuse and repurpose of data is encouraged, especially for open data. Government agencies like the US DOT make their data available for public use, and encourage use of data by a variety of means including “hackathons” to promote interest among analytically oriented innovators and entrepreneurs. It also encourages innovative use of its data by commercial ventures including the Federal Aviation Administration data for private pilot iPads and the analysis of truck incident data patterns by insurance companies. Such uses may not always be specified by DOT’s enabling legislation (28).

Despite the advantages of data re-use and repurpose, there are barriers such as data quality and perceived risk of reusing others’ data. While secondary data analysis entails reusing data created from previous projects for new purposes, trustworthiness of data sources could be an issue. Oftentimes, there is lack of documentation of what has been done to the datasets, which becomes a significant disincentive to reusing data. Not knowing how the data was collected and cleaned poses a potential risk of generating invalid results. Standardization of procedures and formats could help to address the problem. For example, if cleaning procedures within an organization, or even across a subject field, were standardized, recipients of data would know exactly how it was cleaned. Also when data follows a standard format, it can be easily integrated for analysis by different users. Right now, extra effort is required from secondary users to preserve data interconnectedness in order to guarantee the data’s understandability and informative value.

Destroy

The destruction method is normally selected based on the underlying sensitivity of the data being destroyed, or the potential harm they could cause if they are recovered or inadvertently disclosed. There are several issues to consider if the owner chooses to destroy the data, including:

· Determining when data should die, how to make the choice, and who makes the choice.

· Document retention laws.

· Data “statute of limitations.”

· Usefulness of historical data versus the need for new, updated data sources.

There are three main effective data destruction approaches (29): overwriting, degaussing, and physical destruction. Each of these techniques has benefits and drawbacks (30) discussed in the following list:

· Overwriting: Using software or hardware appliances to overwrite data. This is one of the· most common ways to address data remanence. The advantage of this approach is that it is relatively easy and low-cost. It can be used selectively on part or all of a storage medium. On the downside, it takes a long time to overwrite an entire high-capacity drive. It also may not be able to sanitize data from inaccessible areas such as host-protected areas. In addition, this process can only be used when the storage media is not damaged and is still writable.

· Degaussing: using a device to remove or reduce the magnetic field of a storage disk or· drive. The key advantage of this approach is it makes data completely unrecoverable. However, a strong degausser can be expensive and heavy. It may even produce collateral damage to vulnerable equipment nearby due to its strong electromagnetic fields. Also, the damage to the drive is destructive; the drive will be unusable after degaussing.

· Physical destruction: physical media can be shredded or shattered using various· physical destruction methods to keep the data from being recovered. For very low risk information, this may mean simply deleting electronic files or using a desk shredder for paper documents. However, these types of destruction methods can be undone, making these methods inappropriate for more sensitive data. For more sensitive data, stronger methods of destruction at a more granular level are needed to assure that the data are truly irretrievable. On the other hand, physical destruction can provide the highest assurance of absolute destruction of the data since it is impossible to reconstruct or recover the data from a disk or drive that has been physically destroyed. But this involves high capital expenses and is considered an unsustainable and a costly way to dispose of data.

For the purpose of protecting privacy, data destruction is a critical and often required process. Personally identifiable information (PII) is often collected by businesses and the government and then stored in various formats. In the United States, at least 31 states and Puerto Rico have enacted laws that require entities to destroy, dispose, or otherwise make personal information unreadable or undecipherable. The Federal Trade Commission’s Disposal Rule also requires proper disposal of information in consumer reports and records to protect against “unauthorized access to or use of the information.” The rule applies to consumer reports or information derived from consumer reports (31). In Texas, the Business and Commerce Code includes regulations about disposal of certain business records (Tex. Bus. & Com. Code § 72.004). The Texas Administrative Code, Title 13, Chapter 7 establishes the minimum requirements for destruction of local governments’ source documents. However, there is no established law in Texas regulating how the government’s data must be destroyed.

A common question is how long data should be retained before being destroyed. The answer varies depending on the kind of data. For research records, it is recommended they be kept for at least five years and possibly longer, depending on the longest applicable standard (32). For Texas state agencies, the answers can be found in the agencies’ retention schedules. A records retention schedule is a document that identifies and describes a state agency’s records and the lengths of time that each type of record must be retained. Texas state agencies are required to submit their retention schedules to the Texas State Library and Archives Commission (TSLAC) on a timetable established by administrative rule. If a record series does not appear on a certified records retention schedule, it may not be destroyed without obtaining special permission of TSLAC’s executive director. Historically, TxDOT retained five years of crash data. However, in 2015, an update to TxDOT’s TSLAC retention policy was made, and TxDOT moved to a 10- year retention period for crash data. As a result, TxDOT has crash data from Jan. 1, 2010, to present, and will accrue data for 10 calendar years. Records prior to Jan. 1, 2010, have been purged and are no longer available.

Tuesday, February 8, 2022

Data Management Life Cycle

No comments:

Post a Comment

Labels

INSTRUMENTATION MANUFACTURERS