DESIGNING RATIONAL DATABASE TABLES

computers

ALTE DOCUMENTE

WhiteCap

Wife 1.0 Just Released

Visual Basic.NET

Install Subauthentication

TouchLite Setup for Vib Net Operations

Monitor Active Web and FTP Connections

Project: 3D Ant Attack For Windows

DESIGNING RATIONAL DATABASE TABLES

In this video, we are going to discuss how to make our database tables relational. Or in other words, how to separate data into smaller related tables that are linked together by a series of keys. And we will talk about what primary and foreign keys are, and how to create a relationship between the keys and the tables. And finally, we will discuss normalization, which are the steps that we can take to best divide up tables as we are designing them.

Now, before we really get started, I want to reemphasize that the main goal of relational databases is to maintain data integrity. We will talk about this quite a bit in the proceeding lessons. But, now we are going to see what distinguishes relational databases over other types of databases that may have historically a save data in what are usually termed flat files, a data files that are filled with redundant and therefore possibly incorrect data.

So, how is this accomplished? Well, we have headed several times that we need to create smaller related tables to, kind of, break the data up in some way, and link these little tables together using keys. But what are keys, and how do they work? There are primary keys and foreign keys, and they work together to create a relationship between two tables in our databases.

So, let us start with primary keys. Designating a column as a primary key means that when we need to uniquely identify a single row of data, we need to nothing more than look at the column that is designated as a primary key. A key is lot like a license plate on your automobile. There might be quite a few other black Chrysler mini vans in the world. But, my mini van is uniquely identified by its license plate. So, when you pinch if there are two black Chryslers mini vans parked next to each other, I can tell which one is mine, by simply looking at the license plate. The same thing is true with a database. We may have several customers who have a popular name like John Smith, for example.

[Start: 00:00:00 End 00:02:03]

So, how do we tell all of the John Smiths apart, at least from a database perspective? Can we use their phone number or their physical mailing address? Possibly, but it is also possible for a father and a son who share the same name to live at the same address, have the same phone number and so on. Besides it is not reliable piece of information you use because not everybody has a phone number or an address, and honestly, it is not very efficient because numbers work better for this purpose than strings of characters.

However, if we were to create a column called customer ID for our table, and give it the identity column property, we could use it to uniquely identify one John Smith from another one. So, each row would then be unique. From a database perspective, a primary key column has a special constraint applied to it called a unique constraint. Even if we wanted to, the database would not allow us to insert two records with the exact same customer ID. And this is important because we cannot reliably link data between tables if we do not have reliable way of telling each row apart. So, this is why we use the identity column. It is a great way of ensuring that each row will have at least one thing that makes it distinct and unique.

The second type of key is a foreign key. This allows us to imply that there is a relationship between one or more records in the second table, and a single record in the first table. And so, this is where a quick visual example might really help out. So, I have a very simple table called customers. You can see that I have got a first and a last name, and I have also created a CustomerID. Now, the intent with this CustomerID column is to use it as a primary key for our table. So, I am going to use the identity property on that given column.

[Start: 00:02:03 End 00:04:03]

Additionally, I am going to designate it then as a primary key, and you can see the little key symbol next to the CustomerID in the table.

Next, I have a second table that I have created called orders, and I have defined a column called OrderID, and my intent is to make that the primary key of this table. Now, the next field down the CustomerID column, I intend for this to be the foreign key, and I will use this to relate the orders back to the customer. So, a single customer can have one or more orders. If we have a row in the customer's table with the CustomerID of 1, we might have multiple rows in the orders table that each have a different OrderID, but have the same CustomerID, and that allows us to relate those two tables together. I have designated the CustomerID column with that little foreign key graphic, the FK. Now, do not let the fact that we have an OrderID as the primary key of that table, throw you off because, in turn, the orders table may have other tables that is related to. Each order may have a number of order items that are associated with it. So, one order can have multiple items in the shopping cart, for example. And so, we would use the OrderID as the primary key of that relationship and in another table that is not shown here, we may have an order items table that has a foreign key of OrderID. Of course, we have not displayed that here. But, the fact of the matter is that a table can have one, only one primary key, but one or more foreign keys.

[Start: 00:04:03 End 00:05:57]

Now, I have added one little last graphic, which is a little designation from the type of diagram called an end of the relationship diagram that shows that there is relationship between these two tables that are defined, and that the nature of their relationship is that one customer may have many orders to the use of the crow's feet on the right hand side of the graphic.

Let me interject one little side note here, and that is the use of this datetimestamp field. We could have just as easily called this order datetime, which actually might be a little more descriptive. But if you ever see this term datetimestamp, typically what this means is the date and time that this particular record was inserted into the database and that is all. It is just a common invention that some developers use.

So, one last bit of information before we move on. When we sit down to actually develop the tables that we design on a napkin or on a sheet of paper or in our head using SQL Server Management Studio Express or a tool like Visual Web Developer Express Edition, and use the visual tools for developing databases like this on the previous lessons, we will explicitly declare the CustomerID columns and the OrderID columns as primary keys. And then we will explicitly declare and create a relationship between the foreign key and the primary keys by creating a relationship constraint. It is also known as a foreign key constraint between the two tables and we will demonstrate how to do this near the end of this particular video.

So, let us take a moment to stop what we're doing and go to Visual Web Developer 2005 Express Edition, and use the visual tools to create 2^nd table from our previous lesson example, define primary keys, and create relationships between two tables.

[Start: 00:05:57 End 00:08:00]

We are taking a look at SQL Lesson 02 of the project that we originally created in lesson 2, but have been referring to throughout the last two or three lessons. Really for our purposes, the only thing we are really interested in is the MySample.mdf database that we added, and we created a table called customers. Let us go ahead, and take a look at that table. I am going to right click on the table, and select Open Table Definition from the context menu, and as you recall from the previous video, we have defined CustomerID and set that with the property of identity specification to 'yes', and set the identity increment to 1, and the identity seed to 8. But, at this point, we have not really defined this as a primary key. It is our intent to use it as a primary key, but it has not been defined as such just yet. It is very simple to do this. All you need to do is select the column in our designer view, and then select 'set primary key' from the toolbar. And now, we see that we have little key next to our column name. Now, we have a primary key set on this column, and this will become very important in just a moment. So, let me go ahead and save that change, that fundamental change to our database table. Then, I am going to create a new table. Let me select 'add new table' by right clicking on tables and selecting a new table and here, we are going to define a couple of columns. We know we want this to be our primary key. Now, our primary key has to have a value in it. It cannot be empty. So, I am going to have to deselect 'allow nulls'. Additionally, we want to use this as our primary key, and as we saw from the previous lesson that involves us turning on the identity specification, and then select 'yes', and leave that false identity increment and identity seed to 1.

[Start: 00:08:00 End 00:10:11]

So, that should be all that we need to do for the moment with our primary key column. Now, the next thing that we want to do is create a foreign key relationship between this table and our customer table. Now, our customer table primary key is CustomerID. Do I have to name the foreign key exactly the same? Absolutely not. I could make this, say, customer identification, customer number, customer record number, just record number, ID, whatever you want to call it, it is fine. But, it is really common convention to keep these names the same because it really cuts down on confusion on what this column is, and what is it being used for. So, I am going to name it the same thing. I am going to name this CustomerID. It too would be a type integer, and it has to match the same data type as our customer table since that is the primary key. It cannot be 'null'. We have to have a customer for this order. We cannot have just an orphaned order sitting out there related to no customer. That just would not make sense. However, we are not going to turn on the identity specification. That is not the job of this particular column of information. It will reflect and use our primary key CustomerID. It will share the same values, but we do not want every new order to have a unique CustomerID because then they would not match. The CustomerID in this table would not match the CustomerID in this table. So, I want to just come back to that concept, I think, it will more clear in just a few moments. There is some other information about this table like an order date. We might have a smalldatetime for that, and we might even save the order amount although that is not necessary.

[Start: 00:10:11 End 00:12:10]

We can calculate that on the fly, but the purposes here, we will go ahead, and do that smallmoney, and we can have some other information that make these off, not nullable. And then, I am going to go back to my OrderID column, and this one, I too want to set the primary key to the OrderID, and select 'save', and we are going to call this orders. At this point, there is still no relationship between customers and orders. Sure we defined both of the tables, we even have a CustomerID column in this table and a CustomerID column in this table, and our intent is to eventually set up a formal relationship between the two, but what if I decide not to? Well, if we do not do this, then there is no referential integrity, there is no data integrity between these two tables. What do I mean by that? Well, it would be easy to get into a scenario where we have an order that is not associated with a customer. Let me demonstrate how that is possible. I'm going to go to the orders table, and I am going to show table data by right clicking on orders and selecting 'show table data' from the context menu, and here I am going to enter the information for a given order. I am going to guess at the CustomerID, CustomerID 21, I am going to put in the order dates, and the order amounts; and then hit 'tab' on my keyboard, and it allowed me to create an order, order number 1, CustomerID 21. Now, what we would typically do when it say who is the customer who ordered this, CustomerID 21? Let us look in the customer table. So, I am going to right click, and select 'show table data' so I can view through the customers' table. But, there is a problem. As the data comes up, I see that there is no customer 21. I have a customer No. 4, 9, and 11, but not 21.

[Start: 00:12:10 End 00:14:02]

So, there is no integrity enforced between these two tables, meaning that I can have an order that is orphaned that has no relationship to a real customer. And that is a problem. That means I have allowed bad data to enter my database. So, what can I do to rectify this situation? Well, it is actually pretty simple. I am going to close both of these tables, and then I am going to, let us just open up one of them here. We will open up this particular one. We are going to 'open table' definition, and really the only purpose in doing that was just start and get this little toolbar back. The data toolbar allows me to do several things. We have already looked at being allowed to create a primary key. Now, we can remove the primary key. We are not going to do that. What I really want to do is work with a relationships view. So, I am going to click on relationships, and this gives me a little foreign key relationships dialogue box.

So, we want to formally define the relationship between the customer and the orders table. I am going to show you one of the gotchas, whenever you're first starting, so you know how to resolve the problem. So, let us start with that. I am going to go through the process of actually creating the relationship or we can get stuck at a certain point, and since it such a common mistake, kind of a head-scratcher, as you are getting started, I wanted to point this out, and then we will resolve the issue. So, let us go ahead, a foreign key relationship, and then click the 'add' button on our little dialogue box, and you can see that it automatically names a foreign key for us since it is a database object, it will have its own name, and it is not saved yet. You can see little asterisk next to this. So, as we change some properties about the relationship, that name will change automatically. We do not need to do anything. We can formally rename it if we want to here in this particular property in the property window. What we want to do is go to the table and call specification property, select the little ellipses button. You get a new dialogues and columns. Now, here we want to select the primary key table and the foreign key table, and then once we selected those tables, then we are going to select both the primary key and the foreign key respectively.

[Start: 00:14:02 End 00:16:08]

So, I want my primary key table to be the customer's table, and the CustomerID would be the primary key. So, that is correct. But, over here, we can see that the foreign key table is customers. Well, that does not make sense. We want the foreign key table to be orders. So, how do we change that? Well, it is a 'read only' field. You cannot really change it. So, this can become very confusing if you are just getting started. What I suggest is that if you find yourself in this scenario, and you are just not able to change the information you need to change, you understand which table is primary key and which is supposed to be your foreign key, and you are not getting the results that you want, you need a backup and start over. This is the little gotcha I was warning about a few moments ago. So, when I cancel, I must select this, and click 'delete', and start over from the beginning. I am going to close this customer table because this is the source of the problem in the first place. I do not want to change anything about it so, I select 'no' in the dialogue box, it pops open. Instead we are going to open up the orders table this time, and to our table definition view. Now, I am going to select the relationships button, and click 'add'. Now at this time, we can see that we have a different name. I am going to again go to the tables and columns specification. Click on the little ellipses. This time, I am going to selects customers as the primary key table, and CustomerID is the primary key. We can see now, we have pre-populated the orders table as the foreign key, and we need to change the foreign key because we do not want orders to be related to CustomerID. We want to compare apples to apples. So, we want to create a relationship between the CustomerID and the CustomerID in the two tables. We will 'ok', and now you can see I have a foreign key between orders and customers.

[Start: 00:16:08 End 00:18:02]

One other little piece of information that is automatically set for us is that we do want to enforce the foreign key constraint. We could choose to ignore that, but we would have the same scenario that we described a few moments ago, and that would be counterproductive to what we are trying to achieve here. So, I am going to go ahead, and click 'close', and that will save the relationship now. I selected the 'save all' button, and it says that the following tables will be saved in your database. Do you want to continue? It makes changes to both of our tables. I am going to select 'yes' that we want to go ahead, and do that. Now, notice that I get another little problem here, and the reason is because I have the situation that I described earlier where we have data that is not able to relate between the orders table and the customers tables. So, what I need to do is go ahead, and shut this down. I need to show table data, and I need to delete the offending row of data that is holding me back from accomplishing what I need to. Since there was no CustomerID defined in the customer table, I was not able to create this new constraint because we have problems with the data that we need to correct first. So, I am going to select this row, and click 'delete' on my keyboard. I am going to say 'yes' to the notice that I am about to delete a row. I will go ahead, and close this. And I can go back to the exercise of creating that relationship once again.

[Start: 00:18:02 End 00:19:30]

And this time, it allowed me to do it.

You might be wondering yourself what are we really accomplishing by doing this? I mean, we went all this work, but what were the ramifications? Was this really too important ramification of creating this relationship between these two tables? Let us talk about them both in terms of the data. Let us go ahead, and try to recreate the example from a little bit earlier whenever we added an errant CustomerID, and so let us open up the 'show table' data for the orders. This time, let us go ahead, and try to create an order for customer 21, and see what happens. This time when we do it, we are going to get a message that no row was updated. We could not commit the row to the database because it violated a foreign key constraint as you can see right here, the foreign key constraint FK ordered customers. Correct the errors, and retry or press 'escape' to cancel the changes. So, what would happen if we were to change this to a CustomerID we know exists in our customer table. Let us go ahead, and save that, and it worked just fine. Okay? Now, the same can be true. The second ramification is from the other perspective, and it is probably more important, and that is this; whenever we look at the table data here, and we want to delete a customer from our database, we are going to select CustomerID 9, Brian Faley. Can we delete him from the database? I am going select the delete button on my keyboard, and it says, 'you are about to delete one row'. Yes, let us do it. You cannot do it again because the delete statement conflicts with the reference constraint FK orders customers. So, by deleting CustomerID 9, it would have left an orphaned row in our orders table, thereby violating referential integrity, again we are all about keeping the data as safe as possible by promoting a high degree of integrity with the data, and by doing this, we would have deleted a key piece of information, which would have made our entire data store suspect. Can we rely on this data? Well, we absolutely cannot because there is data missing.

[Start: 00:19:55 End 00:22:26]

So, let us go ahead, and click 'ok'. I just want to just impress one more thing on you. Let us create another order for Brian Faley just to show that one customer can have multiple orders. So, let us go ahead, and open up 'show table data', and we will create a second order for Brian. I will change the date and the amount, and I can create a third order for him and so on. So, one customer can have multiple orders. Any given order can only belong to one customer. This is just one type of relationship that we have enabled through the way that we defined the tables. We could also create a scenario where multiple entities have a relationship to other multiple entities. Little more complex, and we are not going to really dive into that for the purposes of these videos. However, there are other types of relationships that are available to you, and that is based solely on how you create your table structure.

Hopefully up to this point, there is nothing that I have said that may have completely overwhelmed you. But you might have a question or two in your mind about the process that you go through to decide what fields of data belong in each of your tables, and going back to our discussion a few lessons ago about Codd who was the father of relational databases, recall the problems that he was trying to solve and the situations that modern computing at that time found itself in were there was a lot of redundant data. And so, he came up with several steps that you could take to convert this flat file format that was common in that day to this new relational database format, and that process or those steps that you could take, he referred to as steps of normalization.

[Start: 00:22:26 End 00:24:36]

And each time you applied one of the steps or rules, you will be able to say that our current database structure is normalized to third normal form or second normal form. And that nomenclature would be something that your peers would understand as you went through this process. Now, there are about seven forms of normalization. Most people never normalize past the second or third form because it does introduce a bit of theoretical complexity into the mix. However, as you continue on in your progress and working with databases, no doubt, you will become introduced to some of these concepts, and be more familiar with the steps on normalization. We're only going to cover the first two in this lesson. Now, the key is that you really do not have to memorize everything that I am about to show you. Some of this will become second nature to you after you have seen it done, and you won't need to think so hard on successive passes at your database structure.

So, as I said earlier, we are going to take a look at the first two normal forms, and beginning with the first one, you can see that the formal definition of it is that you should remove repeating columns by creating a new table, moving the columns into the new table, and linking back to the old table, in a one-of-many relationship. I have got quick before and after a snapshot. You can see that in the before, I have got, kind of, a convoluted mess, customer order having invoice number column, first name, last name, first order date, first order amount, second order date, second order amount, third order date, third order amount, and so on. You can see that this would lead really to a redundant situation where you have to continue to add additional columns every time you want to insert new information about a given customer. In this case, every time he makes another purchase from your company, you will need to insert more columns at the end of his particular row of data. Hopefully, you can see how inefficient that would be. One customer who purchases from you fifty times, would necessitate that the other customers who may have only ordered once or twice, have 48 or 96 unused columns worth of data, and that represents real space on your hard drive. It also makes reporting on that information very complicated. By applying first normal form, we take out first order date, second order date, third order date, and so on. We take out first order amount, second order amount, third order amount, and so on; and move them into their own tables. You can see the after effect where we have two tables now, a customer table with information unique to the customer in that particular table and then information unique to an order in the second table.

[Start: 00:24:36 End 00:27:44]

And then we create a relationship, just as we have done in the previous example between the customer ID as a primary key in the customer table, and the customer ID as the foreign key in the order table. The data regarding a customer is highly cohesive. We would not put in there information about a specific order because it needs to be more generalized to potentially include all information about a given order. Also, from an order table perspective, it is very cohesive, just information about a single order is contained here, not multiple orders, but one order per row. So, hopefully, this makes sense; and seems pretty obvious why we would want to do this.

Let us move on now to second normal form, and it is defined this way. Move repeating fields into a new table that contains a primary key, and relate it back to the old table using the foreign key. In this scenario, you can see the danger of having Bob Tabor listed multiple times or Arnold Rimmer listed numerous times within our database table. The problem with redundant data is that you do not know which version of the data is most correct, and represents the current state of that given row. In other words, as you can see, there are two Bob Tabor records, record number 1 and record number 2, the CustomerID 1, the CustomerID 3. One of them has been working for a company called Bob Co. and the other one has been working for a company LVS. Similarly, you can see with Arnold Rimmer, he is working only with one company, Red Dwarf, but since we have listed also how many employees does this particular account have, we do not know which version of Red Dwarf is accurate. Do they have 4000 employees? Do they have 4400 employees? And this is fairly innocuous example. But, it really highlights the difficulty of redundant data occupying multiple rows within a flat file. What second normal form hopes to improve on is to, kind of, split these out into two entities. By splitting out the company name or the company, which work on the accounts, from the individual people that work for the company, which we are calling contacts, we can eliminate the redundancy. And we may have new information about Bob Tabor, for example, that he works for a new company, we would just simply create a new row within our accounts table, and then associate him with the new account record. The same is true for Red Dwarf Inc. As an account if the number of employees changes as our sales people are going through an updating an information about this particular account, then we know that we have the data saved in one place, in one record; and we do not have to do some forensics to figure out which version of the data is most accurate, latest, and most pertinent to our particular query.

[Start: 00:27:44 End 00:30:56]

I went through this whole exercise of talking about normalization because while it is important to understand the fundamentals of it, I think that, as I said earlier, what is more important is that you get the spirit of what is trying to be accomplished here. And that is the decomposition of information into smaller tables, and the process that you go through in the thinking that you do in how to group multiple columns of related information together in an accurate way so that your tables are truly relational and enforce referential integrity, and keep your data safe. As I mentioned earlier, there are seven potential forms of normalization. Most people move past the second or third form, and it is really not necessary to do that because while it is a good idea to apply some of the rules of normalization honestly as you get into some of the higher forms, there is such a thing as too much of a good thing. Not only does it has become confusing from my design perspective, but then for those who are actually trying to use your table structure to do something meaningful like create an application or create a report, it makes the querying of the data from the database very difficult to save the list. So, there is this old DBA saying that I heard once and I really like, and it is "Normalize until it hurts". So, go through the process of normalization, and really think through why you are doing what you are doing. But then denormalize until it works. Denormalization is basically when you step back from that process until you get something that is comfortable and usable by you and the users of the database.

[Start: 00:30:56 End 00:32:47]

We have covered quite a bit of ground in this particular video. We talked about primary and foreign keys from a theory perspective, and then demonstrated it, how to create those as well as relationships using the visual tools in Visual Web Developer 2005 Express Edition. We talked about how referential integrity is enforced by creating relationships between tables. And then we talked about the process that we use in order to accurately split up columns of data into a more correct form using the process of normalization.

In the next lessons, we are going to move beyond the creation of relative tables, and talk about how to use the structured query language to get data out of and insert data into our tables programmatically.

I hope you are enjoying this series of lessons, hope you are finding them informative in that they are practical enough so that you can use them as you are developing your own projects. If you like this sort of training, please visit www.LearnVisualStudio.NET where you can download over 500 screen cam video tutorials just like this one on all topics related to .net including C#, Visual Basic, ASP.NET and more. Thank you.

[Start: 00:32:47 End 00:34:05]

Document Info

Accesari: 1537
Apreciat:

Comenteaza documentul:

Nu esti inregistrat
Trebuie sa fii utilizator inregistrat pentru a putea comenta

Creaza cont nou

A fost util?

Daca documentul a fost util si crezi ca merita
sa adaugi un link catre el la tine in site

Copiaza codul:
in pagina web a site-ului tau.

eCoduri.com - coduri postale, contabile, CAEN sau bancare

Politica de confidentialitate | Termenii si conditii de utilizare