Documentum but with more recent technologies. At first glance, it is a very attractive and flexible solution but as you know, the perfect software doesn't exist and that's why I propose you a review of Alfresco according to some common expectations you would have.
The objective is to provide you a first glance of what Alfresco really is.
I wrote article not with the intention of selling you or preventing you from using Alfresco, but in giving you a pragmatic overview of the product. Feel free to contact me and/or to leave a comment.
If you only need it, use an NFS or a CIFS share.
If you also need some smart processing on your data, Alfresco could answer to your need because it is possible to configure a sequence of actions triggered by adding, deleting and/org modifying data on the repository. Some of the actions can be but not limited to coping a file, sending an email to the owner of the file or even converting it to another format. Of course you can also extend the mechanism with your own actions.
Sounds good, Alfresco is, after all, an Enterprise Content Management system!
All the data stored in Alfresco is uniquely identified in the physical storage. Deleting something in a repository doesn't physically destroy the content of the data, the system moves it in quarantine... You can also parameter the number of days that data is kept which is useful in preventing people from accidentally deleting important informations.
Alfresco uses JBPM, a mature workflow engine. This enables it to support workflows from simple review-and-validate to complex ones designed with jBPM Process Designer.
Last but not least, the trigger mechanism on the repository allows you to transform, move, copy, send by email, ... any data stored on Alfresco.
Data stored in Alfresco are all handled by a data model. Any stored data, users, permissions, documents, directories, ... are all bound to a data model.
I wrote 'a' and not 'the' because data models can be extended in order to fit your needs. Therefore it is possible and simple to add custom metadata to let's say, pictures or office documents.
Also, out of the box, Alfresco provides several data models like:
Adding metadata is at the root of Alfresco's content management architecture.
If default access rights doesn't fit you, you can create your own permission model or extend the existing one. You may also want to know that permissions are handled per file, per directory and that quotas can be applied.
Also, Alfresco uses roles. But what is a role?
It is basically a set of permissions. You bind role to a user or to a group of users per file or per directory.
By default, the following roles are provided, feel free to extend it:
We could talk more about it in another article.
There are several possibilities.
First web UI of the product, it doesn't evolve a lot today. Its characteristics are:
Personally, I don't recommend to use this UI but only for administration purposes. Indeed, there is far more interesting things to do if you want a great experience for your users.
Share is the web 2.0 UI of Alfresco. It is a user-centric collaborative website, However, in the recent 3.2 version of Alfresco, an administration panel has appeared. It aims at providing collaborative sites to people where they can share their work through wiki pages, forums, document library, links, their calendar, ...
It is based on the Surf platform, a framework built by Alfresco to easily create 2.0 websites and which communicates with the Alfresco core through REST.
Surf has been recently integrated in Spring.
Alfresco provides a REST API also called Webscripts and a SOAP API.
Those two API allow developers to create custom remote frontend which is good since it is a great way to let the core server focus on its main task: content management.
A new API available in Alfresco, CMIS has been implemented in Alfresco 3 in order to facilitate inter-operability among several ECM systems. This means that a frontend which uses the CMIS API should be able to switch from Alfresco to another backend implementing CMIS without any customization.
If you need it (Liferay integration for example), the Alfresco repository implements JCR (JSR-170). You can use it locally or remotely over RMI.
However, this method is not recommended since the emphasis is on REST and CMIS.
Presently, we are on a collaborative platform project based on Alfresco and our client uses Alfresco as the fundations for its content management.
The expectations of our client can be classified in three main categories:
These three kind of expectations are covered by Alfresco. The administration is achieved through the web-client UI, the user friendly UI is a flex based website communicating with Alfresco through REST and the portal like expectations are covered by Share, the collaborative website solution provided by Alfresco on which we can add new components to better fit its needs.
At last, we have a consistent, performant and scalable solution and we have saved time compared to a development from scratch.
We had this kind of expectation, Alfresco as a backend and Liferay for the portal frontend.
Even though Alfresco claims that the integration with Liferay is very good, our experience showed that it is inter-operability rather than real integration. The portlet standard allows agregation of contents retrieved as HTML code. This is exactly what Alfresco provides with few portlets written with its Webscript API but they are more a proof of concept than a real integration.
Therefore it is up to your development team to write their own portlets and their REST services on the Alfresco side in order to achieve the integration. If you want Liferay to use Alfresco as its repository backend, it is possible since both implements the JSR-170 but your development team will also have to write down some glue code.
No problem! Alfresco uses the LDAPFactory of the JVM. Therefore, it integrates perfectly with an LDAP directory whether it is AD, OpenLDAP or something else.
Some directories have non standard fields? No problem, everything is easily customizable through configuration files.
By default, new synchronized accounts are created in the root folder of the repository, but it can easily be moved to a specific /users subdirectory. You can also customize the template of a user account directory so every new user directory can be created with some specific directories and documents in it.
By default, removing a user from the Alfresco users database doesn't remove its home folder. This behaviour can be easily modified so the home folder is removed along with the account.
Synchronization through LDAP like any other background task performed by Alfresco is scheduled by Quartz. The latter let you configure when you want a synchronization to happen just like in a standard crontab.
In spite of it, Alfresco works with several directories at the same time and distinguish users and groups. In this way, it is easy to adapt the configuration to your context.
Lastly, if your LDAP directories limit the number of possible answers in one request, Alfresco, since the 3.2 version, supports LDAP paged requests.
No problem but a bit of development. Quartz is provided by default by Alfresco and is configured through Spring beans. If you want to remove it from Alfresco, you can but you will have to modify some configuration beans and maybe have to write Webscript code to allow some jobs to be executed remotely, from your central scheduler for example. I assure you it is not impossible.
As an example, we were about to do it for a client who wanted LDAP synchronization to be done remotely, by its central scheduler. In Alfresco, the synchronization is a two steps process:
So we had two tasks, two java classes and one XML with a well known structure.
Therefore, to achieve the expectation we just had to:
But this solution has a small issue, the java program requires the Alfresco server to be shutdown during the update... No problem, we just have to write a small REST service in order to execute the import thus there is no longer a need to shutdown the server. This REST service is pretty simple since the only thing we have to do is already provided in Alfresco's internal API.
No problem, LDAPS and HTTPS are perfectly supported.
Kerberos, NTLM, CAS, JAAS, you have the choice. Chained authentication is also supported in the case where, for example, an extranet is login-form based and the intranet is SSO based.
Alfresco supports this kind of setup. The configuration effort is minimal here.
However, you may want to know that if the cluster mode is activated, every Alfresco instance must use the same database instance and the same data repository.
Indeed, it is true for every data that have a content.
The users, directories (but not their content), metadata and workflows are stored in a database which is hardly not loaded. A simple MySQL is enough, no need for aggressive optimization on this side.
Oracle, MSSQL, MySql, PostreSQL, ... The choice is wide enough. In spite of it, the load on the database is really small, Alfresco uses neither complex schema nor high frequency requests since it indexes most of its content and heavily uses Hibernate and EHCache.
The data repository needs a storage capacity according to your expected amount of data.
The index adds about 30% of the data repository size. In spite of it, the index, implemented with Lucene must be located on a very fast physical drive which supports file locking.
Alfresco also works with this kind of configuration even though it is a bit complex to configure. In spite of it, you must keep only one database and your backup procedure must still backup everything at the same time.
In the case you would like to setup an internal repository with all your data and an external one with a readonly subset of data for internet, you may want to know that it is supported by Alfresco. It is a kind of cluster configuration.
In this case, the internet instance of Alfresco is readonly and shares the same repository and database as the internal instance which has full access to the repository and each instance has its own index.
We have recently run into the case at Octo and our solution was to avoid a new instance of Alfresco. We chose to create a REST Webscript on Alfresco in order to let the "readonly" side retrieve what it needs through it. Data to be exported were marked by a specific tag.
Alfresco works with Tomcat and JBoss, it is a WAR or an EAR you have to deploy and that's all.
Alfresco provides an audit mechanism which keeps track of everything that happens on every data in the repository. Configure it cautiously in order to prevent a big increase of the space occupied by your database and a big slow down of your server.
Alfresco is spring based and its architecture is good if you plan to use it as a foundation on which other specific projects will occur.
Alfresco works with mediaWiki. However, if you only need a wiki, it may be more interesting to use only mediaWiki and to eventually link it to Alfresco or another ECM the day you will need an ECM.
Alfresco provides Share. However, it doesn't cover all the needs and customizing it might be fastidious.
The best is to use an Alfresco facade API (REST, SOAP,...) to build custom user friendly UI.
The scalable architecture of Alfresco fits very well and you can quickly hand over it. Furthermore, there is an active community which can help you thru forums and blogs.
Be advised that it is important that your developers know already about Spring, Java 6 and tomcat in order to be rapidly operational.
It is possible and that's what Octo does. However, full test driven development (TDD) with Alfresco may sometimes be fastidious, since all of the code is not covered by tests and executing a unit test take about 40 seconds because a mini instance of Alfresco must be initialized. Having an experienced person on Alfresco is definitely a good idea to save time. We have encountered many issues in the setup of REST unit tests with Alfresco because of not so well done things like the time taken to run a single test (a mini instance of Alfresco is launched) or the amount of configuration and files to copy in order to start a single unit test. However, in 80% of the cases, unit tests worked well and since the product is a free software, it made our life easier and we saved time when bugs happened. And...I think that's an important point.