Draw an annotated DFM diagram for the following ER scenario. Use annotations from the list in Week

12 Sep Draw an annotated DFM diagram for the following ER scenario. Use annotations from the list in Week

Posted at 09:49h in Uncategorized by admin

Draw an annotated DFM diagram for the following ER scenario. Use annotations from the list in Week 3 documents.

Upon your arrival at the Department of Emergency Medicine, you will need to register at the reception desk located at the waiting room. You have to complete a paper form with the requested information and deliver the form to the clerk at the admission desk. The clerk may ask you few follow up questions based on the information you provide in the form. Your information is entered by the admission clerk in a computer systems and the paper form is filed in a cabinet. Our hospital systems have five locations in Texas, Louisiana, Arkansas, Mississippi and Missouri. You can receive services at all these locations in the future since they will have access to your electronic admission records which is stored in in amazon cloud. All communications between Amazon cloud and the local systems use secure protocols. While patients' name, SSN, address and telephone number, and dates of visits are stored in Amazon cloud as plain texts, patients' health and insurance related information are saved in encrypted form. Admission clerks have access to unencrypted data, but the doctors, hospital administrators, and billing personnel have access to all data. Treatment category (not specific disease information) information is shared with the specified insurance companies for the billing purposes. Copies of the health records are made available to the medical researchers after removing patients' name, SSN, address and telephone numbers.

Data Flow Modelling

We will introduce a language and notation for modelling components, processes, structure and the flow of data within a system. Models can be annotated and partitioned to show further aspects including architectural, geographical and legal boundaries and so on. We then show mechanisms for the refinement, partitioning and analysis of these models.

For any given development project it will be necessary to con struct a number of data flow models to fully capture the various use cases and scenarios of the particular system in question. Through this we can truly understand and reason about from and to where data is flowing, through which components, for what uses and where the control points over this data are.

Basic Notation

The basic elements of a data flow language are those which show the source and target points of data and the data flows be tween these59. In our language we define five kinds of element, each with their own graphical notation depicted in figure 27.

• Processes

59 Peter Gorm Larsen, Nico Plat, and Hans Toetenel. A formal semantics of data flow diagrams. Formal Aspects of Computing, 3,1994

• Users

• Environments

• Stores

68 PRIVACY ENGINEERING

• Leaks

We make it compulsory to name all elements in the model, with the same name referring to the same element if used in a number of different diagrams or use cases.

Analytics Processing The Database 1 Spy Agency

Figure 27: Processes, User, Environ ments, Stores and Leaks

Process Environment User Store Leak

Processes are places where data is processed by some compu tational entity, this could be anything from a small filtering function to a large analytics cluster environment or software component depending upon the specific modelling needs and the required level of granularity.

Users refer primarily to humans interacting with the system and Environments to things outside of the system that exist in the 'Real World' such a scenes for a photograph or other sources of data. Leaks are explicit notifications to the reader that some data flows in the model flow fo unknown places or points where an unauthorised data flow would be especially problematical. Typically this would be used to bring into ques tion any confidence about that particular component leaking that data.

Stores denote any place where data can be held for a period of time, for example: a database, a file (including temporary files), a log file or even a physical piece of media such as memory stick. Again the granularity depends upon how detailed a model is required and here we could even see internal partitioning even to table or some other structural level.

Data flows link elements of the above node types together and denote the general direction of communication – the precise meaning of this is explained later in this chapter. Data flows are named by default by referring to their start and end points. In cases where more than one flow exists between two nodes this uniqueness is not possible and a further distinguishing name should be given.

DATA FLOW MODELLING 69

We make a syntactic distinction between 'normal' data flows and return data flows. This latter notation is used to emphasise where data might return back to a user. Data flows are always directional as we wish to emphasise the overall flow of data rather than any particulars of the underlying communications protocols. The basic form of a data flow between two processes (and this follows for other node types too) is shown in figure 28. Note that we have chosen to leave the flow unnamed, though we always have the option to do this for reference purposes.

Figure 28: Data Flow Between Two Pro cesses

In figure 29 you will note that we have two flows from one process to another. As noted earlier, this shows two separate 'conversations' or channels of communication between these processes. The actual break down into separate channels is largely due to whatever granularity of modelling is required. Note that we explicitly name the data flows to distinguish between them in this case.

-PrimaryRo

Social Camera

App Service SocCam

Figure 29: Multiple Data Flows Between Two Processes

If we need to emphasise a flow back to some originator of some data then we can utilise the return flow notation as shown in figure 30. This is purely syntactical and is meant just to place emphasis on this fact to the reader of the model. This reverse flow is not used to describe the ACK/NACK, error correction, key exchange or other two-way features of the underlying trans port protocols. Note that neither data flow is named, though on the return flow we have provided information about the underlying protocol used for transporting information over this, i.e.: we do not model control flow data.

In the model shown in figure 31, we see four different ele ments together. This depicts the prototypical starting situation for many applications and systems. Here we show data is collected from both a human user and the street scene being

70 PRIVACY ENGINEERING

Figure 30: Return Flow Notation

photographed; data then flows via the application and its pro cessing to some storage mechanism.

Figure 31: Example Prototypical Initial Data Flow

Note that the data from the street scene 'environment' flows, via the camera sensors and subsystem(s), to the camera application and not via the user of the application. Secondly the flow from the user to the camera application does not denote any control flow but that the user might be providing data such as personal details, picture meta-data, etc. Furthermore we are showing no partitioning of the model such that we can not infer whether the camera application and storage are on the same device, or whether there is even a device at all.

A further situation that we might wish to model are flows out to some 'unknown' entity – specifically to show some kind of potential leak of information that must be explicitly noted, reasoned about and maybe later protected from; this is shown in figure 32.

3 Utter Agency /

Figure 32: Example Leak to an Unknown Entity

Leaks are always modelled as sinks of information flowing away from a process, store or user via some data flow. Our syntax is defined such that showing a leak stemming from a data flow

DATA FLOW MODELLING 7I

is not possible; data flows should always be considered liable to being breached and thus leak information. The purpose of the leak notation as has been explained is to alert the reader to possible sinks of data which might exist due to incomplete or poor specification of a system, or untrusted components and so on. Note that this is a separate concept from data being leaked due to user actions such as might occur due to incorrectly set privacy settings with a social media provider as might occur in the situation depicted in figure 33

Figure 33: Example Leak to a Known En tity

The above examples, while simple, show the basic structure and I concepts of a data flow model. It is important to remember to i concentrate on the directionality of the flows and that we are ¡ not expressing how the underlying communication protocols

¥ work. i

! It is usual that a single data flow model does not show every- V thing. Models should be constructed with consistent naming

of elements and flows so that elements can be tracked across use cases and other models as required. If models become too complex or cluttered to read that it is good practice to split the models up into a number of individual diagrams. Models must also be backed up with textual descriptions and references to other documentation describing the system at hand.

72. PRIVACY ENGINEERING

Annotating Data Flow Models

Each element, flow and even partition in a data flow model can be additionally annotated with information about its nature. The properties of these annotations should be formally defined in some ontological or taxonomical structure and a number of examples of these are given in the following sections.

Data Subject and Diagram Context

When working with data flow diagrams and privacy it is important that we annotate the initial source or sources of data and in particular the source about whom the diagram is con structed. The terminology used to describe this individual is 'data subject' and is derived from various pieces of privacy leg islation. We will borrow this terminology and utilise the UML stereotype notation60 as a convenient method of annotating this.

In the examples provided earlier we have seen this notation being used, for example in figure 33 it is unambiguous that we are explicitly referring to the data being collected from the marked data subject. The use of this annotation is not compulsory but its inclusion is strongly recommended and necessary when especially describing larger models or specific situations where there may exist ambiguity about the source or the context of the data.

60 UML Stereotype Notation – seems to fit well here, semiotically speaking

Data Flow Transport Protocols

Documenting the nature of transport over a data flow provides much information about what kinds of data can be collected from the protocol layer and also give hints about what kinds of requirements need to be placed on that flow. The transport protocol is generally a combination of the layer 4 (HTTP, HTTPS, FTP etc.) protocol and any of the relevant higher level protocols61,62 as necessary. Syntactically we denote these as a list of the protocol names and an example of this is

61 Andrew Tanenbaum. Computer Net works. Prentice Hall Professional Tech nical Reference, 4th edition, 2002. ISBN 0130661023

62 John Day. The (un)revised OSI refer ence model. SIGCOMM Comput. Com mun. Rev., 25(5):39-55, October 1995. ISSN 0146-4833

DATA FLOW MODELLING 73

given in figure 34 which shows a number of data flows each utilising a variety of transport protocols.

Figure 34: Annotating Data Flow Trans port Protocols

If we write: <<http» then we infer that we mean the HTTP protocol only. If we write multiple transport protocols such as < <http,https>> then this means that both are used for different parts of the data conversation over that particular data flow or that some choice of transport protocol might exist. In such circumstances it might be well worth decomposing the data flow to differentiate the parts of the conversation or performing more analysis of the system. If no protocol is provided then this means that either none is applicable or that this information is undecided or unknown.

The choice of protocol has implications regarding the data being carried and extractable over that channel. Typically most proto cols provide source and end device addresses as IP addresses63 and timestamps as a minimum. As described earlier this is what is termed 'traffic data' and this must be taken into consideration when calculating the whole information content of a channel.

Data Flow Channel Content

The specification of what content is transmitted over a data flow is the most critical piece of information in any data flow

63 We rarely see non-IP based addressing these days and protocols such as DecNet and SNA

74 PRIVACY ENGINEERING

model. The transport protocols only state the mechanisms of how the conversation over a flow is mediated. Further annota tion of a flow can show this information content as well as other aspects such as the security level and so on as necessary. We define later a classification system for expressing the contents of a flow, but here we provide a self-explanatory example of this as shown in figure 35.

Figure 35: Annotating Data Flow Contents

In this example we state that we can expect identifiers of various kinds, location information and timestamps, and that this is in addition to anything provided by the HTTP protocol in this example.

The use of high-level 'types' or 'kinds' such as location or iden tifier is important in that it explains the content without getting confused with machine types or various representations of data. It is often seen that data flows are noted to carry 'JSON data' or are 'RESTful' – both of these do not describe the content but rather the syntactical representation of the content and an architectural style of calling an API.

Providing detailed information such as a schema or field names often leads to confusion – the naming of data structures does not necessarily provide unambiguous information about what the data contained therein really is. Providing a high-level type gives us the opportunity to focus discussion on the kind of information and not whether it is hashed, encrypted or con tained in some machine type such as a VARCHAR or int. This is especially true when dealing with location data, especially when data such as geographical location typically is not typed using some geometrical type but as a structure of real numbers.

DATA FLOW MODELLING 75

Calculating the Complete Information Content of a Channel

In order to evaluate the complete set of information avail able via a channel it is simply a matter to add together the information types of the channel content and the information contained in the transport protocol together.

Each transport protocol can be mapped to a set of information types according to the parameters it uses for its own internal workings. For example, the HTTP protocol64 over TCP/IP 65/66 provides a large number of headers as well as addressing and routing. For example, if we have a data flow that contains Device Identifiers over the HTTP protocol then the total content would be Device Identifier, Temporal, Machine Address and various kinds of Content which itself could be further refined to reflect significant parameters contained within the HTTP headers.

Annotating Processes

Processes can be annotated similarly to denote the kinds of processing taking place within that element in much the same way as we annotate the data subject. At high levels of abstraction obviously many tasks may be taking place and in these situations we can surmise that using no classification is an indication of such. However after decomposition of nodes or if we model at a suitably detailed level, then explicitly stating that a process does fall under certain data transformation classes is a useful indicator to the reader of a model about what might be happening.

Similarly analysing the incoming and outgoing data flows and their contents can be cross-checked against the process's data transformation classification. Any process with two or more incoming data flows is likely to be performing cross-referencing of the data; similarly any abstracting or filtering process can be checked by ensuring the output data flows contain less infor mation than the incoming data flows. Furthermore in the latter situation, any process with two or more incoming data flows is relatively unlikely to be just abstracting or filtering. Such

64 R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, R Leach, and T. Berners-Lee. RFC 2616, Hypertext Transfer Protocol – HTTP/1.1, 1999

65 J. Postel. Transmission Control Proto col, September 1981. Updated by RFCs 1122, 3168

66 RFC 791 Internet Protocol – DARPA In ternet Programme, Protocol Specification. Internet Engineering Task Force, Septem ber 1981

76 PRIVACY ENGINEERING

classification can also be applied to user elements as well as process elements, though this kind of usage would be rarely seen in practice. Applying this classification to stores, leaks and environments is not permitted. Examples of this notation can be seen in figure 36.

Log Fîtes

In this example we see the flow of data from the application through the various stores via processes performing varying tasks upon whatever data is being consumed. As well as the three types presented above we also note that one process is marked << identity >> to denote that it does not perform any transformation of the data in any form. One process – Data Cleaning – is annotated with two kinds of processing, this means that both kinds of processing take place. It is possible that all four kinds mentioned here can be placed upon a process (see figure 37) which should act as an alert that much decomposition of that process is required to properly understand its internal workings.

The case where no annotation is provided is similar to the aforementioned case with annotating data flows and suggests that this description is not required, unknown or irrelevant in the current modelling context.

Partitioning

Figure 36: Example Annotation of Pro cesses

¥ V Figure 37: An ‘Over Annotated’ Process

Just working with the flat data flow model as described earlier gives us information of the processes and other elements

DATA FLOW MODELLING 77

that make up a system as well as the various channels carrying the data between them. To go further we need to group those elements together in order to explore particular boundaries over which the data flows. When modelling a system we are required to group processes, stores and even users and environments together to express such as aspects as but not limited to:

• architectural boundaries, including both logical and physical distribution between devices, servers, cloud etc.

• operating system/application process boundaries

• security and trust boundaries

• controller and processor boundaries

• jurisdiction and geographical location

It is often necessary to show multiple aspects in a model. We can do this either by utilising multiple views to the model or by placing all the aspects on a single view and using a suitable naming or even colouring scheme to differentiate between the aspects. We now explain the partitioning notation.

Simple Partitioning

Partitioning is typically used to show physical bound aries, for example in figure 38 we show the logical architecture between a user and an application which stores data locally and 'in the cloud'. Note particularly where data flows cross boundaries, especially in this case between the local device and the cloud which implies a flow outside the control of either. All processes, users, stores and any environments within a partition must be completely enclosed, only data flows can cross parti tion boundaries – other elements can not straddle the partition boundaries. If an individual model element is partitioned so then it must be decomposed to two elements and one or more partition crossing flows.

In the model in figure 38 we can clearly see the logical archi tectural partitioning, the interactions and various contained elements. It should be clear that the nature of the two data flows completely contained within their respective partitions

78 PRIVACY ENGINEERING

Figure 38: Example Simple Partitioning

will imply a different set of requirements and implementations to that which crosses the partition boundaries, specifically the flow between the social camera application and server.

Also note the naming of the two stores in this model, despite both having the same name they are easily distinguished by the partitions in which they inhabit. Care should be taken in such cases where the partitions are not shown that ambiguity or misunderstanding does not occur. This could be achieved by modifying the naming convention to take into account this fact.

Hierarchical Partitioning

The partitioning scheme already described is too simple for many cases and we have to introduce additional structure to capture the hierarchical nature of many properties such as process and execution boundaries or the controller-processor relationship. Within any hierarchy each subsequent partition is completely enclosed within a 'parent' partition. For example, in figure 39 we show a number of process and access boundaries.

In this example we show that processes (or any element) can occur at any level in a hierarchy as long as it is wholly contained or confined within that layer. As earlier elements which occur outside of given boundaries, for example the photograph store element implies that no partition has been assigned for this in the context of the current model. In this case we actually imply that there might be some access or other security or process related concern here with the flow to the leak element named 'snooper'.

DATA FLOW MODELLING 79

Figure 39: Example Hierarchical Partition ing

Overlapping Partitioning

Within some aspects there are situations where the strict hierarchical model does not capture the necessary properties we wish to model. A common scenario is when showing secu rity domains where responsibilities and access may overlap as shown in figure 40.

Figure 40: Example Overlapping Partition ing

In such cases we must note both the points where data flows cross boundaries but also elements that exist within one or more partitions.

😯 PRIVACY ENGINEERING

Annotating Partitions

Similarly to data flows and the various elements in a data flow diagram, partitions too may be annotated the syntax as we have already shown. This is necessary when presenting diagrams that are complex, having multiple aspects presented as partitions, or when there is any chance of ambiguity in the reading of the diagram from any externally provided context, for example, through a textual description of the diagram. This is especially necessary when showing multiple aspects simulta neously

One particular case where this is particularly necessary and a good example of the use of this kind of annotation is when describing the controller and processor aspects of a system. For example in figure 41

In this diagram we are showing multiple aspects – that of the controller/processor and the architectural or logical boundary of some advertising company. The first thing to note however is the hierarchical nature of the controller-processor partitioning and the way these are annotated. We have also annotated the user element with the data subject annotation and show the data flow from the user into a controller. This particular data flow is particularly important as it sets out the expectations for data processing and collection between the initial controller and

DATA FLOW MODELLING 81

the data subject.

From here data flows exit this initial controller to both other controllers and processors wholly contained within those. This is fairly straightforward until we examine the interaction be tween the controller-processor aspect and other aspects such as the logical architectural view which is shown additionally in this example.

Note how there would exist two contracts or agreements be tween the App Provider controller and the data processing services provided by the advertising company as a processor to the App Provider; and similarly between the social media provider and the advertising company Finally take note of the positioning and data flows of the advertising company's data store67.

This actually serves also as a good example of the complexities and discoveries that can be made during modelling and of the difficulties in confining68 data to particular, neatly defined domains and aspects.

Decomposition

Decomposing the structures in a model is used to open up processes and channels to show more internal structure. Performing this in a systematic manner allows us to better reason and about how those particular elements are constructed without accidentally losing important data from the model. We will now describe decomposition over the nodes and data flows in our language. We do not consider decomposition of the partitioning as the specific semantics of this is generally out of scope of the data flow itself.

Decomposition of Data Flows

We have already stated that a data flow is actually a con glomeration of a number of channels of communication. If we

67 The stuff of legal headaches

68 Butler W. Lampson. A note on the con finement problem. Communications of the ACM, 16(10):613-615, October 1973

82 PRIVACY ENGINEERING

take a single data flow and split it into two then the following must hold:

• the start and end points of the new flows will be the same as the original flow

• the information carried over either of the new flows will be a subset of the original flow

• the union of the information carried over both the new flow will be the same as the original flow

• the transport protocol of either of the new flows will be a subset of the original flow

• the union of the transport protocols of both of the new flows will be the same as the original flow

We can demonstrate the above through an example. In figure 42 we have a simple system consisting of a single data flow between two processes. This data flow as the model shows carries data classed as Identity, Content, Location and Temporal by various means over the HTTP and HTTPS protocols.

Figure 42: Data Flow Decomposition: Ini tial Model

Embedded within this data flow is a large amount of infor mation at a high level of granularity. In order to extract the structures that exist inside here we decompose this as shown in figure 43.

Figure 43: Data Flow Decomposition: De composed Model

DATA FLOW MODELLING 83

The original data flow has been decomposed into two separate data flows between the original processes and we can distin guish which content and which protocols are in use over the two parts. To check that this is a correct decomposition from the modelling language perspective we add the information content and protocols of the two flows back together we should get the original undecomposed flow.

We can, if necessary continue with the decomposition of either of these flows as necessary in order to capture the relevant and salient points of our system in the model.

Decomposition of a Node

Referring to processes, stores, user and environments, when we decompose these the following remains true:

• Two new data flows are created between the new processes, each carrying the union of all incoming and outgoing data to the original process.

• The protocol of the new data flows is left undefined

• The original data flows are split between the two processes

In the example presented in figure 44 we have a two stage data flow between three processes. The information content and protocols of the flows are readable from the model as in earlier examples.

Figure 44: Node Decomposition: Initial Model

When a node is decomposed, then we effectively split the data flows over the two new nodes. The check here is again simple,

84 PRIVACY ENGINEERING

if we recombine the two incoming data flows together then this should equal the original single incoming flow; similarly for the outgoing flows.

Figure 45: Node Decomposition: Decom posed Model

In figure 45 we have explicitly shown the logical partitioning of the model around the two new nodes and given a default naming to that partition. This is not necessary to explicitly show, but does provide information to the reader of the model that some kind of partitioning exists between those nodes. Of course, this partitioning might be purely for convenience or model granularity and it is really left to the modeller to decide whether to show this or not.

Refinement

Refinement is another process used to develop the model but this time ensuring that the changes we make only restrict the model69. For example, developing a model such that the information in some store is no longer just an Identifier, but a particular kind of Identifier such as a Device Identifier is a refinement; that is we move from an abstract model to one that is more specific and detailed.

69 Ralph J. Back and Joakim Wright. Refinement Calculus: >4 Systematic In troduction (Texts in Computer Science). Springer, April 1998. ISBN 0387984178

The places where refinement takes place are on the data flows and stores. In both case this is generally a simple matter of ensuring that the information types are more specific and any transport protocols similarly.

Privacy Risk Assessment

High level Steps

Create DFM Diagram (see next slide please) to Identify Surfaces with Privacy Risks: Identify process, storage, channel or environment that may facilitate access to private data/information

Identify Relevant Requirements: For each such surface, identify the subset of the (15) privacy requirements that could be breached at that surface

We will review a process this week presented in “Privacy Engineering: A data flow and ontological approach by Ian Oliver.” The book is available in kindle unlimited.

DFM: Data Flow Modeling diagram is a data flow modeling tool for identifying surfaces with privacy risks (review the attached DFM.pdf).

Major Steps:

Drawing DFM diagram: Model the flow of information through the system via different processes and channels (and possibly saved to some storage)

Annotation: Annotate the DFM diagram with information characteristics, transmission protocols, purpose of usage, risks involved (review the standard annotations that we will use for this class in Annotation.zip).

Decomposition: Split a process or a channel in the DFM diagram if that process or the channel involves information with different privacy implication

Partition: Partition the DFM diagram (possibly in various ways) to identify groups with some common boundaries (that may have common privacy implications)

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

Do you need an answer to this or any other questions?

About Wridemy

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.

How It Works

To make an Order you only need to click on “Order Now” and we will direct you to our Order Page. Fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Are there Discounts?

All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.

12 Sep Draw an annotated DFM diagram for the following ER scenario. Use annotations from the list in Week

About Wridemy

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.

How It Works

Are there Discounts?

Hire a tutor today CLICK HERE to make your first order

Related Tags

About us

Quick help

Subjects covered