La edición empresarial Pentaho 8.1 fue lanzada hace unos meses. Ofrece una amplia gama de características y mejoras, que incluyen: Pasos de transmisión mejorados en PDI, mayores capacidades de chispa en PDI, mejoras en Google Cloud Data, mayor seguridad de AWS. Importante impulso hacia adelante en Big Data Steps, mejoras en la integración de datos y en Business Analytics.
He hecho un análisis de la versión 8.1 y aquí está mi opinión.
Esta versión continúa el compromiso de Hitachi con la interoperabilidad en ecosistemas complejos. En mi análisis del lanzamiento, he identificado un par de temas clave:
- Doblando en la nube
- Mejorar el motor de integración de datos de núcleo, y
- Ampliación de algunos casos de uso de análisis clave
Doblando hacia abajo en la nube:
Google Cloud
Pentaho has significantly expanded its support for the Google Cloud Platform (GCP). Pentaho 8.1 gives you the ability to impeccably connect to the Google Cloud Storage leveraging a VFS browser for importing/exporting data to and from Google Drive. I remember well, a few years ago, one of BizCubed’s customers implemented a plugin for Google BigQuery. Now, with the addition of the Google BigQuery Loader job entry, BigQuery is now a data source within the Pentaho User Console or in the PDI client. You can now set up JDBC connections and create ETL pipelines to access and store data with Google Cloud big data services. It is great to see Pentaho integrating more holistically with GCP. Leveraging a VFS browser for importing and exporting data to and from Google Drive is a good step forward. The integration with Google Drive and the ability to access analytics on BiqQuery too are robust steps forward.
Amazon Web Services
Pentaho has also improved its play in the AWS sphere. Pentaho Data Integration can assume IAM role permissions to provide secure read/write access to AWS’ S3 Web Service. The need to provide hardcoded credentials at every step is now gone. With a significantly reduced credential management burden, there is added flexibility, which accommodates different AWS security scenarios to provide a better user experience while reducing the security risk. The revised S3 CSV Input and Output transformation steps enable PDI to extract data from AWS with the necessary security enhancements. These steps seamless delivery of IAM security keys from environment variables, from your machine’s home directory, or from EC2 instance profile. Pentaho now also has added Adaptive Execution Layer (AEL) support for Amazon EMR.
Improving the Core Pentaho Data Integration (PDI) Engine:
Pentaho has improved the streaming capabilities. Retrospectively, Pentaho had a streaming engine, but the tooling had been built around a batch processing mindset. PDI now has been enhanced to a core streaming engine. PDI now has two new streaming data-sources (MQTT Input & Output AND JMS Input & Output). There is even a Safe Stop for streaming processes. Essentially, you can now safely stop streaming transformations without loss of records. This safe stop is available in batch transformation within Spoon, Carte, and the Abort step. There is better workflow handling for streaming data sets and streaming data services. The Transformation Executor step can be used to run a sub-transformation with Spark on AEL. More Big Data formats are now supported natively. Optimized Record Columnal (ORC) Input and Output transformation steps have been added to enable PDI to perform the columnar data serialization method with indexing to ease the development of pipelines that handle these formats. Native handling of ORC files through input and output steps is available from any standard storage system and is also accessible through Virtual File System (VFS) drivers. To improve performance, native execution of the steps can occur in the Pentaho engine or in Spark using AEL.
You now have access to enhanced worker nodes via the Hitachi Vantara Foundry project. Features that have been added, include: improvements to monitoring, with accurate propagation of Work Items status for monitoring, performance improvements by optimizing the startup times for executing the work items, customizations are now externalized from docker build process, job clean up functionality etc.
Expansion of Some Key Analytics Use Cases:
Pentaho with v8.1 offers improvements to its Business Analytics capability. There is Continuous Axis for Time Dimensions in Visualizations. Line, Area, and Chart visualizations now use a continuous display of data for the Time Dimension. The data points are now proportional to the time duration for a more visually accurate representation of data trends. Previously, the time axis used discrete data points equally spaced. The real-time streaming support for dashboards and improved Time series visualization are much-needed features. Furthermore, there is improved data exploration in the PDI tool itself. I know for a fact that many of BizCubed’s clients will be excited about the repository browser improvements.
Entonces ahi lo tienen amigos. Hay mi opinión sobre el lanzamiento de Pentaho 8.1. Si desea hablar más sobre Pentaho o sobre cómo BizCubed puede ayudarlo a tomar mejores decisiones con la herramienta Pentaho (y otras), comuníquese con nosotros.
No comments:
Post a Comment