The End of a DBMS Era (Might be Upon Us)

Ενημέρωση και συζήτηση πάνω σε ερευνητικά θέματα και πανεπιστημιακά νέα.
Post Reply
The Punisher
Venus Former Team Member
Posts: 7561
Joined: Thu Oct 27, 2005 1:43 pm
Academic status: Alumnus/a
Gender:
Location: Boston, MA

The End of a DBMS Era (Might be Upon Us)

Post by The Punisher » Mon Sep 13, 2010 12:44 pm

Πηγή : BLOG@CACM
The End of a DBMS Era (Might be Upon Us)
Michael Stonebraker

Relational database management systems (DBMSs) have been remarkably successful in capturing the DBMS marketplace. To a first approximation they are “the only game in town,” and the major vendors (IBM, Oracle, and Microsoft) enjoy an overwhelming market share. They are selling “one size fits all”; i.e., a single relational engine appropriate for all DBMS needs. Moreover, the code line from all of the major vendors is quite elderly, in all cases dating from the 1980s. Hence, the major vendors sell software that is a quarter century old, and has been extended and morphed to meet today’s needs. In my opinion, these legacy systems are at the end of their useful life. They deserve to be sent to the “home for tired software.”

Here’s why.

If we examine the nontrivial-sized DBMS markets, it turns out that current relational DBMSs can be beaten by approximately a factor of 50 in most any market I can think of. What follows are a few examples.

In the data warehouse market, a column store beats a row store by approximately a factor of 50 on typical business intelligence queries. The reason is because column stores read only the columns of interest to the query and not all of them. In addition, compression is more effective in a column store. Since the legacy systems are all row stores, they are vulnerable to competition from the newer column stores. The interested reader can start with “C-Store: A Column-oriented DBMS” to explore this topic further.

In the online transaction processing (OLTP) market, a lightweight main memory DBMS beats a row store by a factor of 50. Leveraging main memory and the fact that no DBMS application will send a message to a human user in the middle of a transaction, allows an OLTP DBMS to run transactions to completion with no resource contention or locking overhead. The interested reader can start with “The End of an Architectural Era (It’s Time for a Complete Rewrite)” to explore this topic further.

In the science DBMS market, users have never liked relational DBMSs and want a non-relational model and query facility. This was the topic of my last ACM blog, "DBMSs for Science Applications: A Possible Solution."

If you are storing Resource Description Framework (RDF) data, which is popular in the bio community and elsewhere, then “Scalable Semantic Web Data Management Using Vertical Partitioning” points out that column stores are very good at certain RDF workloads. In addition, other ideas, such as “RDF-3X: A Risc-style engine for RDF,” will beat conventional DBMSs in other situations. Lastly, native RDF engines (e.g., Virtuoso, Sesame, and Jena) may well gain traction. The point is that something else will beat conventional row stores in this market.

Text applications have never used relational DBMSs. This was pointed out to me most clearly by Eric Brewer nearly 15 years ago in the early days of Inktomi. He wanted to use a relational DBMS to store the results of Web crawling, but found RDBMS to be two orders of magnitude slower than a home-brew system. All the major Web-search engines use home-brew text software to serve us search results. None use relational DBMSs.

Even in XML, where the current major vendors have spent a great deal of energy extending their engines, it is claimed that specialized engines, such as Mark Logic or Tamino, run circles around the major vendors, according to a private communication by Dave Kellogg.

In summary, one can leverage at least the following ideas to get superior performance:

A non-relational data model. If the user’s data is naturally something other than tables and if simulating his natural data model on top of tables is awkward, then chances are that a native implementation of the natural data model will significantly outperform a conventional RDBMS. This is certainly true in scientific data.

A different implementation of tables. If something other than a row store accelerates the user’s queries, then a direct implementation of the relational model using non-row store technology will run circles around a conventional RDBMS. This is true in the data warehouse marketplace.

A different implementation of transactions. Current row stores give you a “one size fits all” implementation of transactions. This can be radically beaten if a user has lesser requirements or if the system can take advantage of workload specific features. This is true in the OLTP marketplace.

One of these characteristics is true in every market I can think of. Hence, in my opinion, the days of a “one size fits all” monolithic DBMS are at an end. The replacement will be a collection of vertical market specific engines, with much higher performance.

You might ask, “What if I don’t care about performance?” The answer: Run one of the open source relational DBMSs. They are mature, reliable, and, best of all, they are free.

You might also ask, “I am dug in deep with my current vendor(s). What do I do?” The answer: Take some portion of your DBMS budget and allocate it to new solutions. Over time, you will move onto better technology.

References

Michael Stonebraker et al., “C-Store: A Column-oriented DBMS,” Proc 2005 VLDB Conference, Trondheim, Norway, Sept. 2005.

Michael Stonebraker et al., “The End of an Architectural Era (It’s Time for a Complete Rewrite)” Proc 2007 VLDB Conference, Vienna, Austria, Sept. 2007.

Dan Abadi et al., “Scalable Semantic Web Data Management Using Vertical Partitioning,” Proc. 2007 VLDB Conference, Vienna, Austria, Sept. 2007.

Thomas Neumann et al., “RDF-3X: A Risc-style engine for RDF,” Proc VLDB Endowment, 1(1): 647-659 (2008)

Disclosure: Michael Stonebraker is associated with four startups that are either producers or consumers of data base technology. Hence, his opinions should be considered in this light.
============

Πολύ ενδιαφέρον άρθρο, είναι μια άποψη που δεν είχα ακούσει ως τώρα. Οι database people τι έχετε να σχολιάσετε ;
User avatar
mikem4600
Gbyte level
Gbyte level
Posts: 1363
Joined: Fri Mar 12, 2004 2:00 pm
Academic status: Alumnus/a
Gender:
Location: A Galaxy Far, Far Away
Contact:

Re: The End of a DBMS Era (Might be Upon Us)

Post by mikem4600 » Mon Sep 13, 2010 1:22 pm

NoSQL databases έχουν αρχίσει εδώ και λίγο καιρό να μπαίνουν πλέον και σε production συστήματα...
Autocracy hates questions. Anarchy hates answers.
User avatar
Zifnab
Venus Former Team Member
Posts: 7581
Joined: Tue Nov 15, 2005 2:42 am
Academic status: MSc
Gender:
Location: Connecticut
Contact:

Re: The End of a DBMS Era (Might be Upon Us)

Post by Zifnab » Mon Sep 13, 2010 2:14 pm

Καιρός ήτανε - δεν ξέρω πως θα αντιδράσουν οι σκληροπυρηνικοί!
User avatar
Ισοβίτης
Venus Former Team Member
Posts: 1262
Joined: Sat Apr 21, 2007 5:45 pm
Gender:
Location: Πίσω από τα σίδερα
Contact:

Re: The End of a DBMS Era (Might be Upon Us)

Post by Ισοβίτης » Mon Sep 13, 2010 3:01 pm

mikem4600 wrote:NoSQL databases έχουν αρχίσει εδώ και λίγο καιρό να μπαίνουν πλέον και σε production συστήματα...
Έχεις δει βελτίωση στο performance; Διαφορές στο scaling; Συγκρίσεις; Και για τι είδους data μιλάμε;

Προσωπικά τις θεωρούσα ελκυστική λύση (αν και πάντα υπάρχει και αντίλογος).
Συγχώρα με που δεν καταλαβαίνω τι λένε τα κομπιούτερς κι οι αριθμοί...

Image

Find me: Image Image Image Image Image
User avatar
mikem4600
Gbyte level
Gbyte level
Posts: 1363
Joined: Fri Mar 12, 2004 2:00 pm
Academic status: Alumnus/a
Gender:
Location: A Galaxy Far, Far Away
Contact:

Re: The End of a DBMS Era (Might be Upon Us)

Post by mikem4600 » Mon Sep 13, 2010 6:55 pm

Γενικά είναι ταχύτερες αλλά όχι εντελώς εξωπραγματικά και εξαρτάται αρκετά από την εφαρμογή (με data τάξεως δεκάδων ΤB). Αν α) η εφαρμογή κάνει όσα πιο πολλά μπορεί στη μνήμη χωρίς να σκίζεται στα queries και β) χρησιμοποιήσεις 2nd level cache (του Hibernate π.χ.), τότε ένα μεγάλο κόστος της βάσης το έχεις ήδη αποφύγει. Όταν χρησιμοποιήσαμε τα τελευταία, τα TPS μας ανέβηκαν στο θεό - κατά τάξεις μεγέθους!

Το σπαστικό με τις NoSQL βάσεις είναι ότι κάθε μία έχει το δικό της API. Ενώ πριν είχες π.χ. JPA+Hibernate και χέστηκες αν από κάτω είναι Oracle ή MySQL (με σπάνιες εξαιρέσεις)... Αν ο πελάτης έχει euros και θέλει παπάδες, του δίνεις την Oracle. Αν είναι τζαμπατζής, του βάζεις τη MySQL.
Autocracy hates questions. Anarchy hates answers.
Post Reply

Return to “Ακαδημαϊκά Νέα”