Industries Needs: Data Lifecycle: From Big Data to Smart Data

Abstract— Data management is becoming increasingly complex, especially with the emergence of the Big Data era. The best way to manage this data is to dispose a data lifecycle from creation to destruction. This paper proposes a new Data LifeCycle (DLC) called Smart DLC that helps to make from raw and worthless data to Smart Data in a Big Data context. In order to do this, we have followed a method which consists firstly in identifying and analyzing the lifecycles from a literature review, and then in defining the phases of our cycle and finally in modeling it. The cycle is modeled in the form of a process cartography resulting from the ISO 9001: 2015 standard and the CIGREF framework to facilitate its implementation within companies. Smart DLC is qualified as a set of management, realization and support processes that could be addressed by an Information System urbanization approach. The advantage of modeling the phases such as processes is to be concerned not only with the technical activities but also with management, which is a major player for the success of the technique.

I. INTRODUCTION

The majority of Big Data projects contain dozens of powerful servers, nested in a complex architecture and with many dependencies. However, these solutions have not been able to solve the problem of Big Data because the size of this data is beyond the capacity of conventional database software tools to collect, store, manage and analyze data [1]. Also, these data are too large to be manipulated and parsed by traditional database protocols such as SQL [2]–[4].

Indeed, the survey conducted by Capgemini in November 2014 and revealed at the beginning of 2015: only 27% of IT managers interviewed described their Big Data project as a success. This situation sums up a rather catastrophic situation

A Big Data system must guarantee compliance with the defined Service Level Agreements (SLA), and can only confirm its relevance if it is able to investigate very quickly source problems to repair them.

This situation forces us to advocate a smart management of these data called Big through an adequate lifecycle. The lifecycles that have been analyzed in [5] are not all adequate for the Big Data context but based on the ranking found in [5], the lifecycle Hindawi [6] has been recommended for companies especially those working in a Big Data context. But this model presents some weaknesses, notably the absence of the Planning, Enrichment and Destruction phases and does not deal with sensitive aspects of the Big Data context like Quality Control and Management.

In [5], we have identified a relevant phases in a Big Data context. In this paper, we propose to design a new data lifecycle in a Big Data context. The objective here is to propose a cycle to answer all the aspects of Big Data and to make data smart. This is in a way SMART DATA which was described in [7] as the evolution of the mass of initially unstructured data to the smart processing of data and its transformation into knowledge. The proposed lifecycle will participate in this intelligence and will make it possible to extract from the gigantic mass of the received data, the relevant and useful value. This cycle will solve the limitations on the enormous volume of digital information that must be efficiently exploited in spite of the requirements [8], [9].

We found in [5] that the phases which constitute the data lifecycle in a Big Data context are very complex. Each phase is considered as one or more complex, operational and independent processes, but these processes are linked to one another and to make data management more flexible and smart. Through this article, we try to explain the methodology we used to construct this lifecycle in the form of process cartography from the standard ISO 9001: 2015.

To do this, we identify and define the phases of our data lifecycle model that we have named Smart Data Lifecycle (Smart DLC) and we present it in the second section. In the third section, we position our cycle against existing cycles that have selected in the literature review presented in [5] to show our contribution and added value. Finally, we conclude in the forth section.

II. SMART DATA LIFECYCLE

In this section, we present our DLC that we named Smart Data LifeCycle (Smart DLC). We identify and define, for this purpose, its phases.

A. Smart DLC phases

We noted in [5] that the majority of phases are shared by most data lifecycles, although their nomenclature is sometimes different. Phases are cut into sub-phases for certain lifecycles, others are grouped together to form one phase. For example, the collection phase includes the following phases: data reception, data creation, filtering, data integration and anonymity. Some lifecycles introduce the visualization phase in the analysis phase; others detach it, to have it as one phase wholly.

Following the analysis of data lifecycles presented in [5], we have retained 14 phases: Planning, Management, Collection, Integration, Filtering, Enrichment, Analysis, Visualization, Access, Storage, Destruction, Archiving, Quality and Security.

The nomenclature of the phases and their semantics differ from one model to another. To this end, we have found it useful to resolve this ambiguity by defining each phase in order to clarify its role in our lifecycle.

Planning

This phase is part of the lifecycles of DDI [14], DataONE [15] and USGS [12]. This phase is part of the management processes and must be transversal throughout the lifecycle. Planning involves all other phases of the cycle and gives a preliminary view of what will happen in the medium and long term.

This phase is monitored by the project team, which will determine all the necessary human and material resources for a good management of the company’s data. To do this, a team dedicated to this task is essential to control the planning and correct any errors. It also requires decision-making on aspects related to data management (data lifetime, data security, data archiving, etc.) in order to define a data management plan for the cycle. During this phase, the planning team provides a detailed description of data that will be used and how they will be managed and made accessible throughout their lifecycle.

Management

The phase of Management is part of the planning processes and is presented in a transversal way. This phase was defined by the IBM lifecycle. IBM considers in [13] that management tasks are part of the data lifecycle. Thus, IBM adds layers of management over the traditional lifecycle. However, IBM considers management only at the level of data testing, masking and archiving.

For us, Management concerns all the operational phases that directly manipulate the data. It is a phase that manages the whole lifecycle from end to end and makes communication between all phases effective. It helps identify and capitalize on good practices and manage internal cycle management. Also, it allows to measure, the internal satisfaction of the service rendered by the lifecycle in question. This process is managed by a dedicated team that takes care of everything concerning the lifecycle; in this case processes of realization and also of support in case of need.

The inbound and outbound flows of this process are not technical but managerial which concern all other processes of realization and support. It coordinates all teams that manipulate the data lifecycle and manages the conflicts that may arise between them. Success of data management of the entire cycle is part of the management process.

Collection

Data collection is generally the first step in every data lifecycle. This phase consists of receiving the raw data of different natures and making the conversions and modifications necessary to organize them. A cleaning of data received in real time saves calculation time and memory space. Data quality must be carried out at this level because it makes it possible to optimize the data processing circuit overall, which can prove to be very costly, possibly in the context of Big data. A balance must be struck between speed of access to information and quality requirements [16].

An access control before writing is essential in this phase to allow only the entities authorized to generate data in order to ensure the legality of users identity to protect them against spoofing and verify the execution conforming to data access policy as explained in [10]. This is a security aspect that will be detailed later and which will contribute to data collected reliability. Indeed, we have divided this phase into several sub-processes in order to better manage data at the entrance of the IS.

Depending on company policy, data is either received from outside or created internally or both at the same time. When the data is received from outside, we have set up another type of entry that will act as rules Restriction to forbid any stream that does not respond to the rules previously defined in the planning phase such as noises or erroneous or redundant data sent by IoTs. This type of rules plays an important role and acts as a first filter of non-exploitable data that make the data management system more complicated.

Integration

The purpose of the integration phase is to provide a coherent pattern of data from multiple independent, distributed and heterogeneous sources of information so as to facilitate users accessing and querying such data as if they were accessing only one data source.

This phase involves putting in place rules and policies defined by the planning team to integrate the distributed data because the collection methods are different. Data from different sources is combined to form a homogeneous set of data that can be easily analyzed. Because there isn’t any agreement on data standards, stakeholders tend to use different methods of data collection and management, which complicate their integration. This phase exists in the cycles proposed in [6], [12], [15]–[18]. For the rest of the cycles, this phase is included in the collection phase.

We have chosen to separate this phase from the collection phase because we believe that data collection can be carried out without integrating them in certain situations where the data received do not need to be organized or cleaned. The data type is typically designed for enterprises that operate data internally with well-defined formats, in other words, structured data. However, in the context of Big Data, most data are unstructured and come from several heterogeneous sources, hence the importance of data integration phase, which allows centralizing all data in the following phases for a better exploitation.

The planning team defines the correspondence between the global schema and the source schemes when differences between the two types of schema take place.

Filtering

This phase was introduced by the best lifecycle Hindawi following the analysis in [5]. Indeed, it consists in restricting the large data flow. It is necessary to distinguish between Restriction that exists in the Collection and this phase. The restriction concerns noisy data and errors. On the other hand, filtering concerns data in good quality, which have not been blocked by the Restriction rules but the planning team has considered it useful to filter them because they don’t add value for the company. This filtering can have a positive impact on data analysis in order to make correct decisions. It also makes it possible to reduce the calculation time and the memory space occupied by data to optimize the phases that come after this phase.

The filtering rules are defined by the planning team and need to be reviewed and controlled because these rules often change. A poor choice of filtering rules can have negative impacts on the decisions made by company executives, so particular attention must be given to the definition of these rules. This filtering should not also restrict the diversity of data because if there is too much filtering, data will be lost.

Enrichment

This phase was defined in the Big Data lifecycle in [11]. Data enrichment involves making structural or hierarchical changes to received data. It allows adding information on collected data to improve their quality. This phase assumes having ready and standard data (repositories) to enrich the new collected data. The enrichment data are updated continuously and automatically to contribute to their quality. The planning team defines the enrichment rules and ensures their application. It always controls them because the rules can change according to several parameters (time, place, data themselves, decision making...).

Analysis

It is the most important phase and has been introduced in almost all lifecycles [6], [11]–[16], [18]–[21]. In this phase, the data are exploited and analyzed to draw conclusions and interpretations of decision-making. It makes it possible to draw information and knowledge from received raw data. This knowledge provides a basis for decision-making for policy makers. The planning team sets and defines the methods for analyzing data according to the objectives set by the strategic decision-makers. The chosen methods must respect certain criteria that are sensitive to the company. Indeed, data mining methods are multiple and choosing a specific method requires a considerable effort by the planning team to make this method practical, communicable and objective. It must always be aware of the fundamental questions that have not disappeared. These questions relate to:

· The intensity of the work to be done at the data collection level (and its intensive nature),

· The very high frequencies of data,

· The time required for data processing and coding,

· The relevance of sampling when only a small number of cases can be selected,

· Generation of results,

· Credibility and quality of conclusions and their usefulness to the decision-making staff,

· Action plan.

Access

The access within the Big Data Application Provider is focused on the communication/interaction with the Data Consumer. Similar to the collection, the access may be a generic service such as a web server or application server that is configured by the Data manager to handle specific requests from the Data Consumer. This activity would interface with the visualization and analytic phases to respond to requests from the Data Consumer (who may be a person) and uses the processing and platform frameworks to retrieve data to respond to Data Consumer requests.

In addition, the access phase confirms that descriptive and administrative metadata and metadata schemes are captured and maintained for access by the Data Consumer and as data is transferred to the Data Consumer. The interface with the Data Consumer may be synchronous or asynchronous in nature and may use a pull or push paradigm for data transfer [22].

Visualization

The Hindawi and Big Data DLCs have introduced this phase respectively in [6], [11]. The other cycles introduce this phase in data analysis phase. It consists of displaying the results of the analysis in a clever and intelligent way so that decision[1]makers can easily understand these results and then make decisions.

There are several ways to view the data. Opting for any kind of display requires considerable effort because a badly chosen type of display can distort the analysis and thus mislead the decision-makers in their decisions. The planning team should pay attention to this phase and predict the predefined visualization types or design specific graphic representations to their use case. The results of the visualization must be checked before publishing them to the decision makers and also must keep the anonymity of the privacy.

We chose to detach the visualization phase from the analysis phase because we consider that the visualization of data in a Big Data context is much more complex compared to a normal context.

Storage

This phase is part of all data lifecycles. It falls within the support processes and must be transversal in the cycle. The storage concerns all the other phases of the cycle and makes it possible to store the data throughout its lifecycle in order to have a continuous traceability of data in each phase of the cycle and to know its state of progress.

Storage must be managed with reliability, availability and accessibility. Storage infrastructures must provide reliable space and a robust access interface, which analyzes large amounts of data and also stores, manages and determines data. Thus, the storage capacity must take into account the large increase in the volume of data. However, the enormous amount of data received obliges the planning team to recommend intelligent management of this data because storage is a fundamental and sensitive element of the company information system.

It must be adapted to the needs of the company in terms of:

• Storage capacity: from a few Gigabytes to several hundred Tera Bytes or even Peta bytes.

• Flexibility: ability to allocate automatically resources and intelligent positioning of data,

• Reliability: Automatic data backup and restore capability.

• Security: data protection and privacy, encryption. This parameter will be detailed later.

The planning team is responsible for building a data management strategy. To this end, it studies the essential parameters such as the age and vitality of the data (use rate and degree of criticality) with all the actors in the company.

Destruction

This phase is to delete the data when it is successfully used and will become useless and without added value. There are few data lifecycles that introduce the data destruction phase, including cycles of CRUD [23], PII [24], Information lifecycle [10] and Enterprise lifecycle [17] because they consider that data can still be used despite the fact that the needs are not visible at the present moment. However, we believe that from a moment and despite the fact that enough space is available for archiving data, we will probably find ourselves in a situation where we will have to choose between removing obsolete data or invest more to continually increase storage and archiving capacity. This causes additional costs for the memory capacity and the processing of data volume increasingly gigantic.

We have chosen to destroy the data once it has reached its end of cycle because our cycle is hybrid: linear and cyclical at the same time. The destruction of data must be done intelligently so that it only concerns unnecessary data. To do this, the planning team defines rules and policies for the destruction of data in consultation with the company’s decision-makers.

In a Big Data context, this phase plays a key role and allows concentrating only on the data that will bring added value to the company.

Archiving

This phase consists of long-term storage of the data for possible use. In [13], effective data lifecycle management includes intelligence not only at the archive data level, but also the policy of archiving based on specific parameters or business rules, such as the age of the data or the last date of their use. It can also help the planning team develop a hierarchical, automated storage strategy to archive dormant data in a data warehouse, thereby improving its overall performance.

In addition, in [10], three main operations are required in this phase, encryption techniques, long-distance storage, and data retrieval mechanism. These operations allow the least used data to be moved to separate storage devices for long-term storage. The storage and archiving devices are thus separated. The output streams for this process are obsolete data that are no longer used by the enterprise. The latter can use them for a possible re-use and in this case, this data will follow the cycle again and will be received by the Collection process.

Security

This phase refers to the support processes. It must be present throughout the cycle, and is a transversal process. Security has been present in the lifecycles of IBM [13] and Hindawi [6], . This stage of the data lifecycle describes the implementation of data security and its means as well as the roles in data management to make them confidential.

This phase concerns three essential security parameters that we explained in [5] namely: data integrity, access control and privacy. Indeed, these parameters must be checked throughout the cycle. Thus, in the Collection phase, the data is collected in a secure transmission channel (encryption) in order to verify the reliability of the users data, which participates, on the one hand, in verifying the reliability of the data since they are received from secure sources and on the other hand to allow only the entities authorized to write in the collection system. These sources can only be written in the collection system after they have been authenticated.

This phase was introduced by the IBM lifecycle in [13]. It consists of the masking of personal data. It ensures that a user can use a resource or service without revealing its identity.

This phase involves not only using fictitious data to protect privacy, but also preserving the company’s actual production data. It is the planning team that defines the data that will be masked and the masking rules with a guarantee of anonymity. For the storage and archiving phases, security is ensured by encrypting data, with a data recovery policy in case of problems. However, this phase of data masking must not lead to loss of data value (biased data), so a compromise between privacy and value added must be found.

Quality

In the analysis in [5], only two DLCs include this phase namely USGS [12] and CIGREF [16] cycles.

In [16], we notice that Quality control is provided during transitions from one phase to another. This is achieved through a definition of the quality requirements, the qualification of the level of precision required and then the implementation of controls to measure the satisfaction of the data quality. However, the USGS lifecycle sees this phase differently in [12]. It introduces into this phase the protocols and methods that must be used to ensure that data are properly collected, managed, processed, used and maintained at all stages of the scientific data lifecycle. For this cycle, it is a transversal phase that concerns all the other phases.

We have chosen the quality phase as a transversal process that must be checked at all stages of our data lifecycle. This phase refers to the support processes that offer all the human and material resources necessary for operational processes in terms of security and quality.

This process is performed by a dedicated team, which executes all the rules already defined by the planning team.

III. RELATED WORKS

In order to situate our DLC with other lifecycles of the literature, we present in this section among the twelve lifecycles studied in [5] the top five cycles following the final ranking found in [5]: Hindawi DLC [6], Information DLC [10], Big Data DLC [11], USGS DLC [12] and IBM DLC [13]. And then we illustrate our contribution in relation to these relevant cycles carried out in this paper.

A. Hindawi Lifecycle

Most companies view their data as a valuable asset. In this sense, they provide considerable effort for the development and optimal use of these data. A data lifecycle is modeled to obtain a consistent description between data and processes. According to [6], Hindawi DLC consists of the following phases: collection, filtering & classification, data analysis, storing, sharing & publishing, and data retrieval & discovery. They are presented in figure 2.

It’s the first in the literature using Big Data terminologies and technologies. We note a very important phase that is filtering. This phase, which comes before the analysis makes it possible to restrict the large data flow.

Hindawi DLC was the best cycle according to the analysis conducted in [5]. It is a lifecycle adapted to Big Data context which has several advantages [6]: the data sources are from Big Data. This cycle can handle various formats of structured and unstructured data; in the collection phase, the data are cleaned to allow for better treatment thereafter; before analyzing the large mass of data, a data filtering phase is carried out in order to focus only on the company needs; the storage is managed intelligently because a management plan is implemented to conduct them with reliability, availability and accessibility; and finally, data security is ensured by a dedicated phase. It encompasses policies and procedures to protect legitimate privacy, confidentiality and intellectual property.

However, Hindawi DLC does not introduce the removal of data that become obsolete. We believe that data destruction is very important in a Big Data context. Also, it does not check the quality of the data throughout the cycle. Security is not transverse throughout all phases. Smart DLC encompasses all phases of the Hindawi cycle and positions them more intelligently. The security phase is ensured at any phase of the cycle, unlike the Hindawi cycle, which presents it as a phase limited in time. Similarly, our cycle destroys, at a given moment, the data that becomes useless.

B. Information Lifecycle

This DLC corresponds to a cloud environment. It consists of seven phases: Data Generation, Data Transmission, Data Storage, Data Access, Data Reuse, Data Archiving, and Data Disposal [10].

The advantage of this DLC in the Big Data context is the smart data management during the archiving and disposal phases.

The cycle of Information lifecycle solves the disadvantage of the Hindawi cycle concerning the limited presence of security in its cycle and the lack of data deletion phase. Information lifecycle introduces security as a cross-cutting phase that concerns all other phases. The strongest point for the Big Data context for this cycle is the intelligent management of the destruction and archiving phases. However, this cycle is not very interested in the phases of data collection, planning, management and quality control, which are key phases for us to make the entire cycle intelligent. Smart DLC participates in the intelligence of data collection by introducing data restriction and filtering processes for data not desired by the company and also their integration. In addition, Smart DLC presents the results of data analysis in an intelligent way to enable decision-makers to make efficient and timely decisions.

C. USGS Lifecycle

USGS DLC cannot justify or allow the acquisition of useless data. Data must be acquired and maintained to meet a scientific need. For this, the idea of data management throughout a lifecycle becomes more relevant. This cycle focused on all issues of documentation, storage, quality assurance and ownership [12].

USGS DLC consists of the following phases as mentioned in figure 3: Plan, Acquire, Process, Analyze, Preserve, Publish/Share, Describe, Manage quality and Backup & secure.

This DLC introduces new phases that haven’t a direct impact to data like planning, metadata description, quality and security. Another advantage of this DLC is the cross-sectional presence of data description, quality and security.

The major advantage of the USGS cycle is the cross-sectional presence of the quality and security phases [12]. For this cycle, planning is the first phase to be carried out, so it is a phase that does not concern the other phases in a transversal way. There is no destruction of data. It is obvious that Smart DLC solves all the disadvantages found in the USGS cycle. Indeed, planning is a cross-sectional phase and data destruction is present at the end of the cycle.

D. Big Data Lifecycle

This DLC adapted to Big Data [11] is not very different from other traditional data lifecycles. The new phase of filtering and enriching the data after their collection seems interesting, however the storage of the data throughout the lifecycle appears incompatible with the Big Data since it does not solve the concern of Volume relating to Big Data but on the contrary it makes their management more complicated by their redundancy. Big Data DLC phases are illustrated in figure 4.

The Big Data cycle initially introduced by Yuri Demchenko in [11] and included in [1]–[3], [23], [29], [30] has adopted new methods for improving the collection process. Thus, a filtering and enrichment phase was added after collection to reduce the mass of data initially collected. The particularity of this model lies in the storage phase where the data is retained during all the stages of the lifecycle, which will allow the re-use of the data and their reformatting. Although this data lifecycle has been introduced in a Big Data context, its intelligence is not reached; as no planning nor security nor quality phases are present. The data cannot be deleted.

Smart DLC enjoys the benefits of this lifecycle and solves its shortcomings to make the end-to-end cycle smarter.

E. IBM Lifecycle

In [13], IBM considers that management tasks are part of the data lifecycle. IBM DLC adds over the traditional lifecycle layers of management. It defines three essential elements for managing data lifecycle during the different phases of the data existence as presented in figure 5.

The Smart DLC has the advantage of IBM DLC that consists of adding management, masking layers and adding layers of quality and security that make the data lifecycle more intelligent.

IV. CONCLUSION

In this paper, we proposed a new vision for data lifecycle that considers data management as an Information System urbanization project. To this end, we have modelled this lifecycle as process cartography from the ISO 9001: 2015 standard and the CIGREF framework [25], [26].

The advantage of considering the data lifecycle as process cartography is to support effectively the company data management tasks and their transformations. This cartography takes into account the existing situation and makes it possible to better anticipate the internal and external evolutions or constraints impacting data lifecycle and, if necessary, relying on technological opportunities.

Representing the data lifecycle in urbanized process cartography facilitates its transformation and change. Indeed, data lifecycle changes from one company to another and it can change within a single organization that often changes strategies; which implies major structural changes and complicates interdependence. This increasing complexity has implications for costs, times and risks for data management and decision-making.

To gradually control the evolution of data with the necessary reactivity and to reduce IT costs, a response is provided by the approach of urbanization of data lifecycle processes. This method of urbanization in process cartography aims at a lifecycle capable of supporting data management in the best cost / quality / time. We have chosen a cartography from ISO 9001: 2015 standard [25] and from the CIGREF [26] framework that integrates the process approach and distinguishes three types of processes within the company : management, operational, and support processes

REFERENCES

[1] J. Manyika et al., ‘Big data: The next frontier for innovation, competition, and productivity’, 2011.

[2] K. Krishnan, Data warehousing in the age of big data. Newnes, 2013.

[3] A. Reeve, Managing Data in Motion: Data Integration Best Practice Techniques and Technologies. Newnes, 2013.

[4] P. Zikopoulos, C. Eaton, and others, Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw[1]Hill Osborne Media, 2011.

[5] M. EL ARASS, I. TIKITO, and N. SOUISSI, ‘Data lifecycles analysis: towards intelligent cycle’, in Proceeding of The second International Conference on Intelligent Systems and Computer Vision, ISCV’2017, Fès 17-19 April, Fès, 2017. DOI: 10.1109/ISACV.2017.8054938

[6] N. Khan et al., ‘Big data: survey, technologies, opportunities, and challenges’, Sci. World J., vol. 2014, 2014.

[7] A. Lenk, L. Bonorden, A. Hellmanns, N. Roedder, and S. Jaehnichen, ‘Towards a taxonomy of standards in smart data’, in Big Data (Big Data), 2015 IEEE International Conference on, 2015, pp. 1749–1754.

[8] D. Farge, ‘Du Big data au smart data : retour vers un marketing de l'émotion et de la confiance’, LesEchos.fr, 2015.

[9] F. Meleard, ‘Smart data, l'avenir du contenu’, Echosfr, 2015.

[10] L. Lin, T. Liu, J. Hu, and J. Zhang, ‘A privacy-aware cloud service selection method toward data life-cycle’, in Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on, 2014, pp. 752–759.

[11] Y. Demchenko, C. De Laat, and P. Membrey, ‘Defining architecture components of the Big Data Ecosystem’, in Collaboration Technologies and Systems (CTS), 2014 International Conference on, 2014, pp. 104–112.

[12] J. L. Faundeen et al., ‘The United States Geological Survey Science Data Lifecycle Model’, US Geological Survey, 2014.

[13] IBM, ‘Wrangling big data: Fundamentals of data lifecycle management’, 2013.

[14] X. Ma, P. Fox, E. Rozell, P. West, and S. Zednik, ‘Ontology dynamics in a data life cycle: challenges and recommendations from a Geoscience Perspective’, J. Earth Sci., vol. 25, no. 2, pp. 407–412, 2014.

[15] S. Allard, ‘DataONE: Facilitating eScience through collaboration’, J. EScience Librariansh., vol. 1, no. 1, p. 3, 2012.

[16] S. BOUTEILLER, Enjeux business des données. Comment gérer les données de l'entreprise pour créer de la valeur? CIGREF, 2014.

[17] S. Chaki, ‘The Lifecycle of Enterprise Information Management’, in Enterprise Information Management in Practice, Springer, 2015, pp. 7–14.

[18] J. B. Jade Reynolds, In the context of the Convention on Bilogical Diversity. World Conservation Monitoring Centre, 1996.

[19] C. L. Borgman, J. C. Wallis, M. S. Mayernik, and A. Pepe, ‘Drowning in data: digital library architecture to support scientific use of embedded sensor networks’, in Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, 2007, pp. 269–277.

[20] E. Deelman and A. Chervenak, ‘Data management challenges of data[1]intensive scientific workflows’, in Cluster Computing and the Grid, 2008. CCGRID’08. 8th IEEE International Symposium on, 2008, pp. 687–692.

[21] I. Gam, ‘Ingénierie des exigences pour les systèmes d’information décisionnels: concepts, modèles et processus: la méthode CADWE’, Paris 1, 2008.

[22] W. L. Chang, ‘NIST Big Data Interoperability Framework: Volume 6, Reference Architecture’, Spec. Publ. NIST SP - 1500-6, Aug. 2017.

[23] X. Yu and Q. Wen, ‘A view about cloud data security from data life cycle’, in Computational Intelligence and Software Engineering (CiSE), 2010 International Conference on, 2010, pp. 1–4.

[24] A. Michota and S. Katsikas, ‘Designing a seamless privacy policy for social networks’, in Proceedings of the 19th Panhellenic Conference on Informatics, 2015, pp. 139–143.

[25] Organisation internationale de normalisation, ‘Quality management systems requirements’. 2015.

[26] CIGREF, ‘Les référentiels de la DSI : Etat de l’art usage s et bonnes pratiques’. 2009.

[27] M. Armbrust et al., ‘A view of cloud computing’, Commun. ACM, vol. 53, no. 4, pp. 50–58, 2010.

[28] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, ‘Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility’, Future Gener. Comput. Syst., vol. 25, no. 6, pp. 599–616, 2009.

[29] M. Chen, S. Mao, and Y. Liu, ‘Big data: A survey’, Mob. Netw. Appl., vol. 19, no. 2, pp. 171–209, 2014.

[30] K. Davis, Ethics of Big Data: Balancing risk and innovation. O’Reilly Media, Inc., 2012.

Monday, February 21, 2022

Data Lifecycle: From Big Data to Smart Data

No comments:

Post a Comment

Labels

INSTRUMENTATION MANUFACTURERS