How Modernizing ETL Processes Helps You Uncover Business Intelligence

We live in a world of information: there’s a more significant amount of it than any time in recent years, in an endlessly extending cluster of structures and areas. Managing Data is your ultimate option using which Data Teams are handling the difficulties of this new world to help their organizations and their client’s flourish. 

As of late, we’ve seen information has gotten endlessly more accessible to organizations. This is because of the increasing information storage systems, decline in cost for information stockpiling, and present-day ETL processes that make putting away and getting to information more receptive than any other time. This has permitted organizations to grow in every aspect of their business. Being data-driven has gotten universal and essential to endurance in the current environment. This article will talk about how modernizing ETL processes today helps organizations uncover multiple benefits of Business Intelligence in day-to-day life. 

First of all, we should understand what exactly is an ETL Process?  

ETL represents Extract, Transform, and Load. It is the backbone of present-day data-driven organizations and is often measured on Extraction, Transformation, and Loading parameters. 

  • Extraction: Raw information is extracted or obtained from different sources (like a data set, application, or API). 
  • Transformation: The obtained basic info is modified, cleaned (made free from errors), and synchronized with the goal so that it becomes simpler for the end client to pursue. 
  • Loading: Once the information is modified according to the client’s needs, it is stacked into an objective framework, which essentially is a Business Intelligence (BI) device or a data set. 

Understanding ETL Process: Foundation of information-driven organizations

Each organization needs every group inside their business to make more brilliant, information-driven choices. Client care groups look at patterns to raise tickets or do thorough examinations to discuss areas to give better onboarding and documentation. Presentation groups need better perceivability into their advertisement execution across various stages and the ROI on their spending. Item and designing groups dive into usefulness measurements or bug reports to assist them with bettering their assets. 

The ETL Processes enable various groups to get the data they need to comprehend and play out their positions better. Organizations ingest information from a comprehensive exhibit of sources through the ETL cycle, representing Extract, Transform, Load. The pre-arranged information is then accessible for investigation and use by the different groups who need it, just as for cutting edge examination, installing into applications, and use for other information adaptation projects. Anything you desire to do with the information, you need to pass it through the ETL first. 

This entire ETL process is undeniably challenging to complete. It regularly requires full-time information architects to create and keep up with the contents that keep the information streaming. This is because the information suppliers frequently make changes to their constructions or APIs, which then, at that point, break the contents that power the ETL cycle. Each time there’s a change, the information engineers scramble to refresh their contents to oblige them, bringing about personal time. With organizations currently expecting to ingest information from divergent information sources, keeping up with ETL scripts for everyone isn’t adaptable. 

Modernizing ETL Processes makes a living better 

The cutting-edge ETL Processes follows a somewhat unique request of activities, named ELT. This new cycle emerged because of the acquaintance of apparatuses with updated ETL interaction, just as the ascent of present-day information stockrooms with moderately low stockpiling costs. 

Today, ETL apparatuses do the challenging work for you. They have mixed for a large number of the significant SaaS applications and have groups of designers who keep up with those combinations, easing the heat off of your in-house information group. These ETL instruments are worked to interface with the most significant information stockrooms, permitting organizations to connect their applications toward one side and their distribution centre on the other while the ETL devices wrap up. 

Clients can generally control arrangement through a straightforward drop-down menu inside the applications, mitigating the need to stand up your workers or EC2 box or building DAGs to run on stages like Airflow. ETL instruments can likewise typically offer more powerful alternatives for adding new information steadily or just refreshing new and adjusted lines, which can consider more regular burdens, and nearer to continuous data for the business. With this improved-on measure for making information accessible for investigation, information groups can find new applications for data to produce an incentive for the company. 

The ETL Processes and information distribution centres 

Information distribution centres are the present and fate of information and investigation. Capacity costs on information distribution centres have diminished lately, which permits organizations to stack whatever crude information sources could be expected under the circumstances without similar concerns they may have had previously. 

Today, information groups can ingest crude information before changing it, permitting them to modify the distribution centre rather than a different organizing region. With the expanded accessibility of information and atypical language to get to that information, SQL permits the business greater adaptability in utilizing their information to settle on the right choices.

Modernized ETL processes deliver better and quicker outcomes.

Under the traditional ETL process, as information and handling necessities developed, the possibility that on-premise information stockrooms would fall flat also evolved over time. When this occurred, IT wanted to fix the issue, which generally implied adding more equipment. 

The cutting-edge ETL process in the present information distribution centres avoids this issue by offloading the system resource management to the cloud information warehouse. Many cloud information distribution centres offer figure scaling that takes into account dynamic scaling when necessities spike. This permits information groups, too, in any case, to see adaptable execution while holding expanded quantities of computationally costly information models and ingesting all the more enormous information sources. The diminished expense in register power alongside process scaling in cloud information warehouse permits information groups to productively increase assets or down to suit their requirements and better guarantee no personal time. Basically, rather than having your in-house information or potentially IT group worrying over your information storage and figuring issues, you can offload that practically totally to the information distribution centre supplier. 

Information groups would then be able to fabricate tests on top of their cloud information stockroom to screen their information hotspots for quality, newness, and so on, giving them faster, more proactive perceivability into any issues with their information pipelines. 

Check out our video on – How to use iWay2 Talend Converter for your integration purposes?

 

5 Ways Talend Helps You Succeed At Big Data Governance and Metadata Management

The uses of Talend are multidimensional when it comes to Big Data Governance, making work easier for developers and managers alike. With legacy systems, many aspects can bring challenges to business users, such as not understanding the business values of data, lack of data leadership, or audit trail readiness. Concerning these and several of the hurdles big data governance can pose to organizations, metadata management can be a precious asset.

This blog will focus on how Talend can help a business mitigate the pitfalls, thanks to the five core composites that make the fabric of the robust solution.

Interested to know how? Let’s dive right into it!

1. Talend Studio’s Metadata by Design

Without the help of metadata, you cannot project a holistic and actionable overview of the information supply chain. Having this view is a necessity for change management, transparency, and audit-ready traceability on data flows. It also assists in increasing data accessibility with the help of easy-to-use access mechanisms like visual maps of the search feature. It is convenient to gather, process, upkeep, and trace metadata at the source when it is designed even though it can be retro-engineered in a few instances.

With the help of Talend, all the data flows are created with a visual and metadata-rich ecosystem. As a result, it facilitates fast-paced development and product deployment. As soon as the data flows start running, Talend furnishes a detailed glimpse of the information supply chain.

In the Talend Big Data environment, this is important since various powerful data processing ecosystems lack the affinity for meta-data in comparison to traditional data management languages such as SQL. Talend Open Studio helps organizations to access high abstraction levels in a zero-coding approach to help manage, govern, and secure Hadoop data-driven systems.

Talend Open Studio possesses a centralized repository that maintains a perpetually updated version of an organization’s data flows that they can easily share with multiple data developers and designers. This also makes it possible to export data flows to tools like Apache Atlas, Talend Metadata Manager, or Cloudera Navigator that expose them to a broader spectrum of data working audience.

2. Talend Metadata Bridge: Synchronize Organizational Metadata Across Data Platforms

Talend Metadata Bridge enables easy import and export of data from the Talend Studio and facilitates access from practically all data platforms. Talend Metadata Bridge has over a hundred connectors provided to assist in harvesting metadata from:

  • ETL tools
  • Modeling tools
  • NoSQL or SQL databases
  • Popular BI and Data Discovery tools
  • Hadoop
  • XML or Cobol structures

The bridge enables developers to create data structures while being able to propagate them through several platforms and tools over and over again. It becomes easier and more simplified to safeguard standards, usher in changes, and overlook migrations since any third-party tool or platform can translate data formats to Talend.

3. Talend Big Data: Overcome Hadoop Governance Hurdles

By default, Hadoop is meant to hasten data proliferation quicker than it already is, generating more challenges for organizations. Traditional databases provide a singular point of reference for data, related metadata, and data manipulations. However, Hadoop compiles multiple data and storage processing alternatives.

Hadoop also tends to replicate data throughout various nodes, thus making replications of raw data between the steps of processing because of the high availability strategy.

Hence, data lineage is even more crucial to enable traceability and audit-readiness of data flow within Hadoop. Such factors are a substantial threat to data governance.

However, Hadoop is an open and expandable community-centric framework. The weaknesses it has inspires innovative projects created to mitigate these challenges and convert them into an advantage.

Talend Big Data integrates with Apache Atlas or Cloudera Navigator seamlessly and projects detailed metadata for the designated data flows to these third-party data governance ecosystems. Using this functionality, Talend provides data lineage capabilities to such environments. This provides the necessary depth as compared to Hadoop or Spark where the data flows are hand-coded directly.

With the help of Apache Atlas and Cloudera Navigator, such metadata generated by Talend is easily connected to various data points. They can also be searched, visualized as maps (data lineage), and shared with the necessary authorized users in a Hadoop environment apart from Talend administrators and developers. Thanks to them, metadata is more actionable since they trigger actions for particular datasets as per the scheduled or arrival intervals.

4. Superior Data Accessibility: Democratize Your Data Lake

Up until recent times, big data governance was being perceived as an administrative restriction and not a value-addition by business use cases. However, it has several benefits.

Let’s take the analogy of packaged food. Having information regarding the name, ingredients, chemical composition, weight, quantity, nutrition value, and more details is essential to gain a fair understanding before you consume any edibles.

The same principles apply to data.

Talend has the feature of an extensive Business Glossary in Talend Metadata Manager that facilitates data stewards to upkeep important business definitions for the data. They can also link such data to the tools and environments for accessibility by business users. Talend Data Preparation similarly brings its independent dataset inventory to enable open access, cleansing, and shaping of data as part of their self-service motivators. With the principle of self-service being at the forefront, Talend makes sure to empower users with all the knowledge base they require.

5. Talend Metadata Manager: Manages and Monitors Data Flows beyond Hadoop

It is no longer feasible to manage each data source at a single location. Even though legacy enterprise systems like SAP, Microsoft, and Oracle are not going anywhere, cloud applications will still proliferate. Traditional data warehouses, as well as departmental Business Intelligence, will coexist with additional data platforms in the future.

This not just increases the demand for environments like Talend Data Fabric so that managing data flows across environments becomes seamless, but also drives the requirement for a platform that gives business users a holistic display of the information chain, at the location data is gathered. Organizations working in heavily regulated environments take these extensive steps to mandate these functionalities for maintaining audit trails.

Conclusion:

Talend Metadata Manager provides a business with much-needed control and visibility over metadata that they can successfully mitigate risk and compliance in organization-wide integration with end-to-end tracking and transparency. Metadata Manager brings together all the metadata, be it from Hadoop, Talend, or any data platform supported by the metadata bridge. It also provides a graphic information supply chain to give access to full data lineage and audit readiness. As icing on the cake, Talend converts this holistic view to a language and data map that everyone can easily understand, from people responsible for data usability, integrity, and compliance to the business users.

 

Do you know how single customer view is critical to business success?

In this modern world, with multiple touchpoints, other businesses may grab your customers’ attention in a moment. Your customers may spend their money on your competitor’s business or others because of their service, a great impression on one interaction, or the information that is more accessible as reviews, feedbacks, testimonials, etc., about their company. Your customers may connect to any other businesses across the world at any time through ads, social media, emails, or any medium. 

Similarly, other businesses may use data and attract your loyal customers with a great personalized experience, deals, cashback, etc. Furthermore, your customer may expect the same kind of services, quality, assistance, customer experience from your business too. Otherwise, they always have an option to move on.

However, to retain the customer and understand their expectations, the only solution that we have is DATA. 

The proliferation of the information available to the customers made customer data a critical asset for businesses. On the contrary, businesses can leverage data-driven technologies to reach customers across multiple mediums and gain customer attention in the first interaction using the endless volumes of data available globally. The data re-engineered typical business models in many organizations. Most of the businesses of the day are data-driven. Many brands use data to push ads, engage customers, and raise sales. Besides, they develop products or services, analyze future demands through customer insights. 

Though data plays a vital role in every phase of business, it is crucial to have high-quality data for accurate and reliable insights in the form of analysis. Only then, the smarter use of data adds extra value to the business. Otherwise, the data becomes less useful for the companies. Thus, many big brands are wary of data. This care contrasts these brands from others. These actions often prove them as best. Many experts probed the tacks used by the booming brands. Later, outlined it as brands’ wise use of data as a turn to grow. 

Many companies like Amazon, Starbucks, Lifestyle stores, Netflix, and others pick the highly qualified data. 

Only the right data gives the right results on data analysis. Data is all around. Over 90% of the data got created over the last few years. Every day the world is producing over 2.5 quintillions of data. But using only data of high quality can benefit your organization in identifying your customers and their expectations through data analysis.

Then, what traits define data as high-quality?

A high-quality data has the potential to shape the core business performance, analysis, and ROI. Every business requires apt data according to its trade type. The business data holds first-party data, second-party data, third-party data. The foresighted companies pick and use only suitable and high-quality customer data based on future demands. The following traits define the quality of the data:

  • Accuracy- to hold the information without errors
  • Completeness- to have complete details about a data record
  • Reliability- to maintain the data without overlapping and fragmentation
  • Relevance- to keep only the necessary data that is useful for your organization and disposing of the unnecessary or unused data for gaining the valuable insights
  • Timeliness- to gather, update and handle the data regularly

Well-established data governance and management is a must-have for your businesses to maintain data quality to gain valuable information about your customers.

How do businesses make use of high-quality data?

Absolutely, through a unified customer view along with data analysis.

65%of the respondents mentioned that data analysis played a crucial role in delivering a better customer experience, as per the annual global survey conducted by Consultancyqq11       and Adobe.

Gartner’s report concludes that 81% of the companies consider customer data analysis as a key competitive differentiator.

What is a unified customer view or customer 360-degree or a single customer view?

The single customer view represents the customer profiles aggregated, grouped based on data collected across several internal systems through propensity modeling. These customer profiles give more information on each customer with a group of customers with similar interests, behaviors, preferences, and others.

Why do you need a Customer 360-degree view?

The concept of customer 360 or the unified customer view is much narrower than Master Data Management. Companies often gather and store customer data records in CRM, ESPs, PoS, websites, social media channels, eCommerce, others. The lack of data maintenance may affect data quality and hygiene. In turn, the data becomes more duplicated, unstructured, incomplete, un-actionable, inconsistent, ungoverned, and expired form.

Later, the low-quality data is not beneficial for both organizations and the data owners. Consequently, the organizations fail to recognize the potential customers and have no answers to the questions like:

  • Who are the most valuable buyers?
  • Where do I have up-sell and cross-sell possibilities with current customer records?
  • Which of the marketing efforts is driving the sales?
  • How to improve customer service?
  • What are the areas to focus on for improving service or product quality?
  • What are the customer preferred channels for interactions?
  • What are the chances of business growth in the next quadrant? and many more questions without answers due to the poor data quality and lack of a single customer view besides failing to get apt analysis of data. This is how, many companies with massive customer bases with numerous services and products sometimes lose loyal customers with a lack of data quality, single customer view in real-time.

The team of experts at Artha developed a Customer 360 accelerator to overcome these hurdles. By leveraging Customer 360 on data platform, the organizations can ease the communication as well as the real-time personalization for the customers by:

  • Easing Master data management,
  • Security and compliance issues,
  • Avoiding duplication,
  • Improving the data quality and consistency  
  • Smoothening the flow of Internal systems
  • Turning the marketing efforts more effective with higher ROI by ensuring accuracy in targeting
  • Enhancing the customer experience

What is the approach of the Customer 360 accelerator?

Customer 360 follows a Crystal ball strategy by combining data gathered as:

Behavioral data– for identifying customer actions, reactions, interests, and preferences, thereby analyzing their behavioral patterns to understand their expectations. Behavioral data also includes campaign response data on recorded responses of customers. The information extracted from this data helps to identifying, acquiring new customers with the same behavioral patterns besides putting efforts to retain them.

Interaction data– refers to communication history records and touchpoints in a customer journey. This data gives info on customers’ online activities like using multiple devices, shopping, abandoning the cart, reading or writing reviews, browsing for a specific product or service, and watching related videos, result in multiple digital interactions. The customer 360 views of this data help the companies to maintain a direct, experiential relationship with their customers besides serving them better with relevant, and timely offers. Further, a business can use this info for retargeting. 

Demographic data– represents customer groups based on geographic location, age, gender, occupation, education, income, others. This data outline info to get a better understanding of the potential customers or buyers along with their interests in buying a product or service based on several attributes.

Transactional data- gives an overview of previous purchases, products, or services a customer may like to purchase with a brand or with other brands, mode of payments they prefer, frequency and timing of purchases, name, contact details, subscriptions, or membership cards, etc. This data becomes useful while predicting the demand of a product or service, profits, risks involved in the next quadrant.

Artha’s Customer 360 unifies the data from multiple sources to create a single view of customer data besides providing a 360-degree solution for customer segmentation, relevance, analytics, and targeting.

What are the challenges involved while implementing Customer 360?

The biggest challenge involved during the implementation of Customer 360 is data stored in data warehouses and across various systems in inconsistent and fragment form. The inconsistency in the formats, structure, definitions, dimensions of data is often a result of poor data quality and data governance practices. This scenario further makes the customer records inaccurate, duplicated, missing, outdated, and unreliable for analysis and predictions. 

Good data quality and management can maximize the Customer 360 initiatives with the most advanced analysis. Therefore, it is vital to maintain data quality and hygiene before leveraging Customer 360.

We at Artha, recommend our clients to instill good data governance and master data management practices to meet the business objectives to achieve through Customer 360.

The organizations must have a powerful strategy while planning and implementing Customer 360 for business. Things that you should

consider are:

  • Creation of your buyer journey chart
  • Integration of the multiple data resources
  • Ability to identify and group the customers based on business requirements and objectives
  • Accessibility of the information to the internal teams on single customer views

The above-mentioned steps can lead your business towards the right solution and also guides the teams to:

  • Personalize the product recommendation to several groups of customers with similar traits
  • Notify the customers of cart abandonments
  • Give consistent buyer journey on multiple
  • Devices used by the same the customer
  • Customize the service messages and emails along with discount coupons and offers
  • This list goes on as the solutions of Customer 360 can serve businesses in innumerable ways

Every customer is unique with a different stage of buying. You need to know the type of customer you have (prospect, active, lapsed, ‘at-risk customers) and their expectations before reaching them. You can take the help of Customer 360 for getting a single customer view or detailed info about your customer and reach them with personalized messages, emails, ads, etc. across multiple mediums. 

Conclusion:

Good data quality and a single customer view can make your business more successful with detailed info on customer data through analysis. Hence, Customer 360 can accelerate your data-driven operations to know your customers more than before. 

Our experts at Artha Solutions provided valuable assistance and insights in planning business strategies and implementing business solutions to many SMB (small to medium businesses) and Fortune 500 enterprises. Our innovative data solutions helped the organizations to overcome technical challenges, raise the business performance and fulfill their business objectives.

 

Here Are 9 Ways To Make The Most Of Talend Cloud

The business ecosystem at present majorly revolves around big data analytics and cloud-based platforms. Throughout companies, the functions that involve decision-making and day-to-day operations depend on data collected in their data storage systems. Such organizations, hence, are focused on extracting important information from stored data. Data is subject to go through a series of transformations such as merging, cleaning, and tidying before it can be converted into useful information.

Talend gives businesses a range of data solution tools to utilize such information. Using their products, the organization can democratize integration and enable IT professionals companies to execute intricate architectures in simpler and coherent ways. Talend also foresees all the phases of integration, be it the technical or business layer, including all such products are rearranged on a unified interface. As a highly flexible, performance-driven, and scalable open-source product for data extraction and manipulation on big data, Talend has several benefits and is competitively faster than other brands out there.

Today, we will discuss a few ways that can help you make the most of Talend Cloud. Explained below are 9 ways to use Talend Cloud’s services in the best way for your organization in 2021:

Remote Engines in Virtual Private Cloud Ecosystems

To get the best out of Talend Cloud, we would advise organizations to utilize Remote Engines in lieu of Cloud engines when it comes to their Virtual Private Cloud (VPC) ecosystems (or environment). Whatever VPC you are using, it would be the best practice to ensure that a remote engine instance with adequate capabilities and capacity is designated to work as the remote engine. Talend strongly recommends against using Cloud engines for the same.

Adopting Git Best Practices While Using Talend Cloud

Talend dedicates itself to help organizations streamline their processes, which is why they also have a set of best practices that they follow from Git. A few of these practices consist of employing centralized workflows, using tags as required, and creating branches. Reading more about Git best practices will do organizations and developers a wealth of benefits while running Talend Cloud. You can check out the resources mentioned below which are officially endorsed by Talend Cloud.

  • Best practices for using Git with Talend
  • Best Practices Guide for Talend Software Development Life Cycle: Concepts
  • Work with Git: Branching & Best Practices
  • Talend Data Fabric Studio User Guide: Working with project branches and tags

Using Studio on Remote Engines to Directly Run, Test, or Debug Jobs

While testing, running and debugging Jobs on Remote Engines can be a continuous cycle, you can get the task done more efficiently directly with Studio. Inversions preceding Talend 7.0, denoting a JobServer embedded within a remote engine required you to manually configure the remote execution in the Studio preferences. On the other hand, in the version, Talend 7.0, Remote Engines classified as debugging engines are automatically added to Studio now. You can learn more about configuring this capability by reading the information provided on Talend Cloud Data Integration Studio’s user guide titled “Running or debugging a design on a Remote Engine from Talend Studio”.

Use Studio to Design the Job for Orchestration, Restartability, and Logging

There occurs orchestration while using the cloud. Execution actions on the cloud have the capabilities to start, stop, and get the status of such orchestration. Talend recommends users utilize subJobs for orchestrating pipelines. Make sure to load Cloud logs to an S3 bucket, while also setting up an ELK stack, i.e. Elasticsearch, Logstash, and Kibana stack. While using Studio, one can utilize components like tStatCatcher for loading to an error table. Similar to on-premises, it is recommended to employ a central reusable method to conduct error handling or logging. All in all, it is recommended to design Jobs in Studio for the advantage of restartability.

Set up Notifications on Talend Cloud

To enable your notifications on Talend Cloud, go to Settings and find the Administration menu. You can use predefined notifications recommended by Talend as best practice.

Start, Stop, and Receive Execution Status by Leveraging Talend Cloud API

You can take advantage of the Talend Cloud Public API using tools like Swagger to carry outflows, receive flow statuses, and end Jobs. The Talend Summer ’18 release version enables a continuous delivery for Cloud development by helping users to publish Cloud Jobs from Talend Studio directly with the use of a Maven plug-in. The said feature requires Talend 7.0.1 Studio and enables automation and orchestration of the complete integration process by building, testing, and relaying Jobs to each Talend Cloud ecosystem (or environment). For clarification and further information, you can refer to the Talend documentation.

Update Studio Once Talend Cloud Gets an Upgrade

Upgrading your Talend Studio along with a Talend Cloud update is always the best practice for getting greater efficiency from the platform. Talend Studio has backward compatibility to a certain extent, so you can look up the Talend Open Studio for Data Integration Getting Started Guide for more information.

Since Talend Cloud is not supported by Studio 6.2 anymore, upgrading Studio will get the job done.

Shared workspaces per ecosystem for all the promotions

The Talend best practices recommend using Shared and a Personal workspace both (same as the project) while assigning a remote engine to every workspace.

  • True to its name, a Personal workspace is meant to be used solely by the owner.
  • Development teams are recommended to use Shared workspaces for the code to be shared and centralized. Make sure that the Shared workspace name has homogeneity in each of the ecosystems.

Consistency in Artifact and Workspace Names Across the Company

Finally, Talend highly recommends the user maintain consistency regarding the names of artefacts and workspaces across their company. This is one of the simplest and most common best practices that need to be implemented in every case of software applications. For instance, a component’s name, say name_direction from/to_function should be your own standard but remain consistent. Referring to the best practices for conventions of object naming will help.

Talend Cloud can be a gamechanger for organizations seeking to streamline data solutions. However, using it appropriately to reap the maximum benefits and efficiency takes putting into action the best practices. We hope this blog helped you gain better insight into how you can use Talend Cloud better for achieving your desired optimization.

 

How To Get Started With Migrating On-Premise Talend Implementations To The Cloud

If you’re an on-premises Talend client, and your organization decides to move all the operations to the cloud, you have a huge task ahead of you. The organization is required to license Talend Cloud and assign the task of migrating the existing Talend projects and Jobs to the Cloud product. It can be quite a long task if you go about it on your own. Luckily, you have us on your side to help you know how to prep and get started with on-premise Talend migration to the Cloud.

This blog will cover all the know-how to prepare your systems for the migration so that the process transitions smoothly with no errors. Without any further stalling, let’s get started!

What should one know to get started with on-premise implementations migration to Talend Cloud?

The solutions to a seamless Talend Cloud migration are assessment and correct planning.  Our blog will show you some particular factors that can hinder your success while carrying our self-service Talend migration.  We advise you to understand your existing installations, and then formulate an effective plan for this task.

Assessment

Scan your on-premise Talend installations for the items mentioned below:

  • So you have an on-premises Talend version that is older than 6.4?
  • Does the company possess Big Data Jobs that need to be migrated to the Cloud?
  • Does the organization deploy real-time Jobs?
  • Does the company utilize more than one TAC?
  • Do you need or utilize Talend CI/CD?
  • Does the organization utilize more than 2 to 3 Talend projects?
  • Do the company projects consist of more than 100 Jobs?

These factors may cause your migration task to become a little more challenging as a self-service project.  In such cases, it is advised that you ask for expert help from Talend Professional Services to gain all their benefits

Audit

A Talend Audit Report will assist you to provide better insights regarding the existing Talend projects and the Jobs in them.  The report will analyze the Jobs included and provide some highly useful information to you such as the rating of complexity for every Job, a list of items like context variables that could require changes before migrating to Talend Cloud.

 In Talend pre-7.0 versions, this can be found in the Command Line utility and in Talend 7 and later versions, it is a part of the Studio.

Planning

You also need to create a plan that will incorporate tasks and resources for the migration project.  When putting the plan into motion, take the points mentioned below into consideration:

  1. Make more time for complex Job migrations (per the audit report)
  2. What obsolete Jobs can be removed?
  3. Do you use Subversion in the on-premise Talend source control? If yes, adding a Subversion to Git migration in the strategy is necessary.
  4. How many Talend Job Servers are currently in use?
  5. Will you be using Compute Engines to run Jobs?
  6. Make sure that the migration plan consists of a backout strategy in case things don’t go as expected.

Licensing

Purchase the Talend Cloud license, activate it and assign one of the technical professionals to sign up as the Talend Cloud Security Administrator for the organization’s account.  This staff member must be the technical leader of the migration project and is going to be responsible for the management and provision of the Talend Cloud implementation plan.

Software

Make sure to only use the Talend Software page to get what you need for legitimate purchases. Some tips are:

  1. Buy the latest and best version of Talend Studio.  Don’t use the on-premise version, even if it belongs to the same version as your Talend Cloud.
  2. If you choose to deploy the Jobs on Remote Engines, ensure to download the Remote Engine installer for your system’s operating system.

Complex Use Instances

As previously noted, If your on-premise Talend implementation requires complex use cases such as CI/CD, Big Data, or real-time, you ought to consider getting help from Talend Professional Services.  This also stands true for on-premises Talend projects which consist of more than a hundred Jobs.

Architecture

While Talend Cloud offers you many pathways, work with only Talend Studio, Talend Management Console (TMC), Remote Engines, Cloud Engines, and Git source control for a self-service migration.

Find compatible Git versions for Talend Cloud before beginning with the migration. Will the Jobs need access to local resources for Remote or Cloud Engines after migrating to Talend Cloud? In such a case, try installing one or several Remote Engines. You can install them anywhere on-premises or on the cloud. Make all of the organization’s local data accessible by Talend Jobs. You won’t be required to upload any data to the Cloud, while you can cluster the Remote Engines.

If the Jobs you host are compute-intensive and are independent of local resources, you will be able to deploy to a Cloud Engine handled by Talend. There is no need to install or configure any extra software because Talend manages the SLA for such aspects.

Set-up Configuration Roles

Above, we looked at important information needed to gear up for a Talend Cloud migration, including assessing the present Talend version, projects, and Jobs, that should have facilitated the organization to move all projects and Jobs to the Cloud project using Git/GitHub source control.

Now, we will take a glance at the setup and enablement of the Talend Cloud account, including the types of users, roles, and groups. The  users, roles, and groups of Talend account, as mentioned in a TAC, fall into one or several categories as mentioned below:

• Administrator

• Security administrator

• Viewer

• Auditor

• Operation manager

The designer Talend Cloud comprises of different built-in roles such as:

• Project administrator

• Security administrator

• Environment administrator

• Integration developers

• Operator

Conclusion: Getting The Road to Migration Paved

Once you understand the particular help notes and roles we have mentioned in this blog, it will be incredibly convenient for you to move your data, projects, and Jobs to Talend Cloud. Unless you follow a set plan of action, migration can feel like a huge hassle and run into unexpected complications that may need help from professional Talend technical assistants. However, if you wish to keep the process as a self-driven project, follow this initial guide to take stock of all the aspects of your on-premise Talend and transition into the next phase of Cloud migration for sure shot success.

 

 

The Right Digital Transformation Strategy Will Change The Game

Digital transformation refers to the amalgamation of digital technology into all the aspects of an organization. Such change brings in fundamental shifts in the manner that a business functions. Organizations are employing this transformation strategy to revamp their businesses into being more efficient, seamless, and profitable.

Digital transformation is more than just migrating data to the cloud. It enables a technological framework that can transform all services and data of a business into functional insights to improve all the areas of a company.

Today we will look into what makes digital transformation necessary and how businesses can use the benefits of this game-changing strategy.

Why is Digital Transformation Important?

Digital transformation can modify the way an organization works about its systems, creating an avenue of new innovations that make redundant tasks obsolete and make space for better productivity. While using a digital transformation strategy, your business systems, workflow, processes, and culture are thoroughly scanned to find areas of improvement. Such transformation includes making changes across all levels of a business and leveraging data to make more informed decisions.

The Advantages of Digital Transformation

For many organizations, the catalyst for digital transformation usually involves expenses since shifting their data to a public, private, or hybrid Cloud can substantially lower their operational cost per year. It also frees up their existing hardware and software budgets and also lets team members focus on other important projects. Mentioned below are the benefits of adopting a digital transformation strategy:

1. Magnified data collection

Most organizations keep collecting heaps of data about their customers, but the real advantage lies in optimizing such data to be analyzed so you can drive your operations forward. Digital transformation facilitates a system to gather the appropriate data while integrating it completely to be used as business intelligence by the management.

It formulates a path for various functional units in an organization to translate raw data into valuable insights throughout different touchpoints. By this function, it creates a cumulative snapshot of your customers’ journey, production, operations, finance, and business avenues.

2. Better resource management

Digital transformation converts the business data and resources into a range of tools for the organization. Rather than distributing the software or database of a company, it pulls all of them together to a centralized location. As per reports, on average, 900 applications were employed in enterprise businesses in the year 2020. This could be exceptionally challenging to inculcate a consistent business experience.

Digital transformation today has the scope to integrate applications, software, and databases into a central repository to enhance business intelligence.

This is not a compartmentalized or functional section because it comprises every facet of an organization and can commence process innovation across all units.

3. Data-driven Consumer Insights

Data can help find better customer insights for a business; when you understand your customers and their requirements better, organizations can make a business strategy that caters to the exact demands. Using structured and unstructured data like social media insights, an organization can attain business growth.

Data can help business strategies to render more relevant, personalized, and flexible services.

4. An overall better customer experience

Customer expectations have skyrocketed in recent years concerning their brand experience. Since they are now used to getting an endless sea of options, competitive prices, and quick delivery, it is harder to keep them loyal to a brand. The customer experience (CX) is the new turf for organizations to fight for market dominance. Gartner says that over two-thirds of organizations admit competing majorly using customer experience. In 2020, they expect the projection to reach 81%.

Industry experts have found that CX has emerged as the primary driver of sustainable business growth. In fact, they believe that a single-point incline in CX score can make the annual growth worth millions of dollars.

5. Promotes digital culture and collaboration

By helping your team members with the appropriate tools, personalized to their work settings, digital transformation promotes a digital culture.

While such tools make it easy for teams and employees to collaborate, especially while working from home, they also assist in shifting the entire company leagues above the mediocre. Digital culture is going to be highly relevant and critical in the future for better collaboration. It pushes organizations to help their employees upskill via digital learning and make good use of the digital transformation.

6. Enhanced profits

Businesses that take up digital transformation strategies are shown to have improved performance and profitability. Here are some statistics to put things into perspective:

  • 80% of businesses that have successfully executed digital transformation report increased profits.
  • 85% of companies report they have increased their market shares.
  • On average, business leaders are projecting a 23% hike in revenue growth over their competitors.

7. Enhanced agility

Digital transformation turns businesses to be more flexible as per their circumstances. Learning from the universe of software development, organizations can improve their agility using digital transformation to enhance their speed-to-market whole employing Continuous Improvement (CI) tactics. This facilitates quicker innovation and adaptability while showing the way to organizational improvement.

8. Growth in productivity

Possessing the appropriate tech tools that collaborate well can help to streamline business workflow and enhance overall employee productivity. By automating most of the redundant and mundane tasks and integrating data within the company, employees will work more efficiently.

Conclusion:

With the right digital transformation strategy, companies can find new opportunities to innovate at reduced costs. Digital technologies like data analytics, cloud, mobile, and social media are changing the business dynamics across industries and companies that embrace these innovative technologies, rather than resisting them, have a better chance at finding success in a digital-first business environment.

About Artha Solutions:

Artha Solutions is a premier business and technology consulting firm providing insights and expertise in both business strategy and technical implementations. Artha brings forward thinking and innovation to a new level with years of technical and industry expertise and complete transparency. Artha has a proven track record working with SMB (small to medium businesses) to Fortune 500 enterprises turning their business and technology challenges into business value.

 

How to Choose the Right Data Management Platform for Your Business?

A Data Management Platform helps organizations conduct centralized data management and data sorting, giving businesses greater control over their consumer data. For example, in marketing, a DMS tool can collect, segregate, and analyze data for the optimization, targeting, and deployment of campaigns to the correct target audience.

Data Management Platforms gather information from first-parties such as mobile apps, websites, internet transactions, mailing lists, CRM software, and others. These sources are used by organizations to find their ideal customers and manage them. Apart from using first-party sources, they also use third-party sources such as online interactions and user data.

A Data Management Platform can use applications for customer research with the help of customized content. The data is then merged from both online and offline sources to create a repository that can be used to detect buying behaviors of consumers. The trends and insights gathered over time can help organizations create effective campaigns and strategies to improve their ROI.

For several reasons, getting integrated with a Data Management Platform can be a game-changer for organizations, bringing them not just actionable insights but also the possibility for exponential growth. If you’re looking for a guide before picking a data management platform for your company, this is the blog you need. Rather than focusing on buying tips, we are going to talk about the factors that will influence your investment in a DMP.

Here is a list of questions that you must answer for your business to gain great clarity over the ideal DMP for your organization. Read on to know all about it!

Factors to Pay Attention to While Picking a Data Management Platform for Your Business

Local support

Deploying a data management platform is going to be a lengthy process, so it is essential that you have a local team to support you and professionals who can effectively work together. Can you expect constant support throughout Data Management Platform integration and even after that? Can you meet your contacts regularly to assure streamlined communications?

Data collection & organization

Can the Data Management Platform collect and arrange the first-party data from all sources, be it offline, online, CRM, mobile, subscriptions, or others? Will it be able to adjust the data hierarchy as per your demands or will you need support to get it done? Are parent/child accounts available for your business while managing multiple data sets?

Audience building

Can you build complex audiences by choosing factors you need such as behavior, demographics, interests, content consumption, and motives? Is it possible to layer in third-party data as per the customer scale? Can you use the platform to forecast available opportunities and nooks for the chosen target audience?

Audience insights & reporting

Is there any range of analytics tools offered by this data management platform? Is it possible to make deep comparisons between the first source and third party sourced data for actionable insights? What are the solutions offered to you and how well can these reports be used by the teams to create better strategies?

Retargeting

Apart from creating targeting campaigns, can your Data Management Platform make retargeting effective for audiences, both offline and online, based on a certain trigger?

Campaign optimization

Make sure that the chosen DMP has manual and automatic optimizations available to make campaigns work more effectively using the toolkit integrations. The apt platform will be able to analyze your data and find people from the audience who will show an active interest and intent to purchase your products and services, thereby reducing your overall campaign costs. How well does your DMP intend to optimize campaigns, making them laser-focused yet cost-effective?

Content personalization

By using the correct DMP, you will be able to use the organization’s CRM to bring personalized content to site viewers based on their surfing behavior and motivations. You can talk about customization of content by Data Management Platforms with the vendors to clarify how they can assist you to provide a customized website experience to the visitors.

Multiple device channels

Today, users want organizations to reach them where they are available, be it an app, a social media site, or a website. An ideal DMP will be able to find and recognize customers across channels to give you a multi-device campaign across channels. It also asks the question of how this integration is made possible, whether through a third-party system or the DMP directly. It would also help if the Data Management Platform has its own extensive third-party network to expand the cross-device campaign’s capabilities.

Second-party data

Collecting second-party data apart from the first and third party sources is necessary too, but how does a business get access to such interested organizations? It would be of great utility if your DMP could gather such affiliates by itself rather than having you reach out and depend more on costs.

Flexibility

From a general perspective, most of the Data Management Platforms perform all the main functions. But what is the extra edge that your provider of choice is willing to offer? The factor that would influence this the most is the degree of flexibility, which means how well it can change and adapt as per the changing nature and demands of the customer. As the customer behavior change, given market conditions, your DMP will be required to switch approaches to help you provide better campaigns and services.

Deployment

Make sure that you have a specialized data management platform team in place for a strong initial integration. Ensure that your vendor possesses great training resources, completes the documentation with all the t’s crossed and i’s dotted, and also provides active service and support in case your teams need help. Also, look into getting stellar cross-device integration if your job is more focused on deploying campaigns so that you can get the best out of the data collected.

Conclusion:

When you’re on the lookout for a technological partner, it is crucial to ensure that there is a meeting of minds in the way you both operate. Ask all the questions you think are relevant from the above-mentioned discussion to get the best value out of the data management platform you have been scoping. Think of it as an interview for long-term investments, where this integration is meant to bring you well-defined and tangible returns.

About Artha Solutions:

Artha Solutions is a premier business and technology consulting firm providing insights and expertise in both business strategy and technical implementations. Artha brings forward thinking and innovation to a new level with years of technical and industry expertise and complete transparency. Artha has a proven track record working with SMB (small to medium businesses) to Fortune 500 enterprises turning their business and technology challenges into business value.

 

Unleashing Talend Machine Learning Capabilities

Introduction

This article covers how Talend Real-time Big Data can be used to effectively leverage Talend’s Real-time Data processing and Machine Learning capabilities. The use case handled in this article is how Twitter data can be processed in real time, and classify if the person tweeting has post-traumatic stress disorder (PTSD). This solution can work for any major health situation of a person, for example cancer, which is discussed at the end.

What is PTSD?

PTSD is a mental disorder that can develop after a person is exposed to a traumatic event, such as sexual assaultwarfaretraffic collisions, or other threats on a person’s life.

 

Statistics about PTSD

  • 70% of adults in the U.S. have experienced some traumatic event at least once in their lives, and up to 20% of these people go on to develop PTSD.
  • An estimated 8% of Americans, 24.4 million people, have PTSD at any given time.
  • An estimated one out of every nine women develop PTSD, making them about twice as likely as men.
  • Almost 50% of all outpatient mental health patients have PTSD.
  • Among people who are victims of a severe traumatic experience, 60 – 80% will develop PTSD.

Source: Taking a look at PTSD statistics

Insights into the solution

Considering the high increase in the end-users of the social networks, we expect a humongous amount of data written every day into social networks. To handle such a huge amount of data, we need a Hadoop Ecosystem. Hence, this use case of PTSD is classified as a Big Data use case, as Twitter is our data source.

 

Spark Framework
Apache Spark™ is a fast and general engine for large-scale data processing.
Random Forest Model
Random forest is an ensemble learning method for classificationregression, and other tasks, that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
Hadoop Cluster (Cloudera)
A Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment.
Hashing TF
As a text-processing algorithm, Hashing TF converts input data into fixed-length feature vectors to reflect the importance of a term (a word or a sequence of words) by calculating the frequency that these words in the input data appear.
Talend Studio for Real Time Big Data
Talend Studio to perform MapReduce, Spark, Big Data real-time Jobs.
Inverse Document Frequency
As a text-processing algorithm, Inverse Document Frequency (IDF) is often used to process the output of the Hashing TF computation in order to downplay the importance of the terms that appear in too many documents.
Kafka Service
Apache Kafka is an open-source stream processing platform written in Scala and Java to provide a unified, high-throughput, low-latency platform for handling a real-time data feed.
Regex Tokenizer
Regex tokenizer performs advanced tokeni

 

Step 1: Retrieve data from Twitter using Talend

Talend Studio not only supports Talend’s own components, it also supports the custom-built components from any third parties. All these custom-built components can be accessed from Talend Exchange, an online component store.

  • Taking advantage of a custom Twitter component, we can get data from Twitter by accessing both REST and Stream APIs.
  • To take advantage of the Hadoop ecosystem and for Big Data, we implemented a real time Kafka service to read data from Twitter.
  • Talend Studio for Real-time Big Data has Kafka components that we can leverage to read the data that is being read by the Kafka service, and pass it on to the next stages of the design in real time.

To perform all of the above, we need to get access to the Twitter API.

 

Snapshots of Talend Job designs

Deciding which hashtags to use plays a vital role. We may use a single hashtag, or a combination of multiple hashtags to pull the accurate data required. Choosing appropriate hashtags helps to filter the large volume of source data.

Step 2: Create and train the model using Talend

As we all know, nothing can be done without human intervention. Once the data pulled from Twitter is in place, we need to manually classify the tweets as Having PTSD or Not Having PTSD.

Classification can be done by adding a new attribute to that data. Values can be Yes or No (Yes – having PTSD, No – Not having PTSD). Once the classification is done, we can call this data as a training set that can be used to create and train the model.

To achieve our use case, before creating the model, training data needs to undergo some transformations such as:

  1. Hashing TF
  2. Regex Tokenizer
  3. Inverse Document Frequency
  4. Vector Conversion

After passing through all the algorithms above, training data can be passed into the model to create and train it. The model that suits this prediction use case best is the Random Forest Model.

Talend Studio for Real-time Big Data has some very good machine learning components that can perform regression, classification & prediction using Spark Framework. Leveraging the capability of Talend to handle machine learning tasks, the Random Forest Model has created and trained the model with the training data. Now we have the model ready to predict the tweets.

Note: All the work is done on a Cloudera Hadoop Cluster, Talend is connected to the cluster, and the rest of the computation is achieved by Talend.

 

Snapshot of a Talend Spark Job design

 

Step 3: Prediction of tweets using Talend

Now we have the model ready on our Hadoop cluster. We can use the process in step 1 and pull the data from Twitter again, which acts as a test data. The test data has only one attribute: Tweet.

When the test data is passed to the model we have created, the model adds a new attribute Label to the test data, and its value will be Yes or No (Yes – having PTSD, No – Not having PTSD). The predicted value depends solely on the way the model is trained in step 2. Again, all this prediction can be done in Talend Studio for Real- time using Spark framework.

 

Snapshot of a Talend Spark Job design for prediction

 Evolution of the model

Once the model predicts the classification of the test data set, we find the records to be 25% erroneous (on average). We need to assign the right classification to that 25% of the records, add them to the training set, and retrain the model. It should predict accurately now. Add more records to the training set, and repeat the same procedure until the model becomes accurate. A model needs to evolve over time, by training it with newly added training data that comes with time. Some management is required.

Note: To boost the effectiveness of the model, we can add synonyms of the training data to the training set and retrain the model, which leads to developing the model synthetically rather than just organically.

A threshold of 90% accurate predictions is a must to classify the model as accurate. If the prediction accuracy level drops below 90%, then it is time to retrain the model.

Real-time applications from this use case

Note: Once the classification of data is done (Yes or No), it may lead to many more useful real-time applications.

Broader Scope

The use case solution designed can work for any of the major health situations. For example, if the use case is with cancer, using cancer-specific hashtags we can train the model in an equivalent way and start predicting if the person has cancer or not. The same real-time applications as discussed above can be achieved.

Authors: Madhav Nalla, Saikrishna Ala, and Kashyap Shah

This Article also published on Talend Community Blog:
Source: https://community.talend.com/s/article/Unleashing-Talend-Machine-Learning-Capabilities

Achieve better performance with an efficient lookup input option in Talend Spark Streaming

Description

Talend provides two options to deal with lookup in Spark streaming Jobs: a simple input component (for example: tMongoDBInput) or a lookup input component (tMongoDBLookupInput). Using a lookup input component will provide heavy uplifting in performance and code optimization for any Spark streaming Job. 

Instead of looking up the entire data from the lookup component, Talend provides a unique option for streaming Jobs: to query a smaller chunk of input data for lookup, thereby saving an enormous amount of time and building highly performant Jobs.

By Definition

Lookup components like tMongoDBLookupInputtJDBCLookupInput, and others provided by Talend execute a database query with a strictly defined order that must correspond to the schema definition.

It passes on the extracted data to tMap in order to provide the lookup data to the main flow. It must be directly connected to a tMap component, and requires this tMap to use Reload at each row or Reload at each row (cache) for the lookup flow.

The tricky part here is to understand the usage of the Reload at each row functionality of the Talend tMap component, and how it can be integrated with the lookup component.

Example

Below is an example of how we have used a tJDBCLookupInput component with tMap in a Talend Spark Streaming Job.

 

  1. At the tMap level, make sure the tMap for the lookup is set up with Reload at each row, and an expression for globalMap Key is defined as well.
  2. At the lookup input component level, make sure our Query option is set up to query the globalMap Key (where condition extract.consumer_id) is defined in tMap as shown below. This is key for making sure the lookup component only fetches the data needed for processing at that point in time.

Summary

As we have seen, these minute changes in our Streaming Jobs can make our ETL Jobs more effective and performant. As there will always be multiple implementations of a Talend ETL Job, the ability to understand the nuances in making them more efficient is an integral part of being a data engineer.

For more information, reach out to us at: solutions@thinkartha.com[/vc_column_text][vc_column_text css=”.vc_custom_1596545053063{padding-top: 30px !important;padding-bottom: 30px !important;}”]Author: Siddartha Rao Chennur

This article also published on Talend Community:
Source: https://community.talend.com/s/article/Achieve-better-performance-with-an-efficient-lookup-input-option-in-Talend-Spark-Streaming

Quick Start Guide: Talend and Docker

Enterprise deployment work is notorious for being hidebound and slow to react to change. With many organizations adopting Docker and container services, it becomes easy to incorporate their Talend deployment life cycle into their existing Docker and container services, creating a more unified deployment platform to be shared across various applications within an organization.

This article is intended as a quick start guide on how to generate Talend Jobs as Docker images using a Docker service that is on a remote host.

Also, to provide better understanding on handling Docker images, a few topics below are discussed by drawing comparisons between sh/bat scripts and Docker images.

Setting up your Docker for remote build

Talend Studio needs to connect to a Docker service to be able to generate a Docker image.

The Docker service can run on a machine where Talend Studio is installed, or it might be running somewhere on a remote host. This step is not needed if Docker is running on the same machine where Talend Studio is installed; this step is needed only if Talend Studio and Docker are running on different hosts.

Building a Docker Image from Talend Studio v7.1 or Greater

In v7.1, Talend introduced the Fabric 8 Maven plugin to generate a Docker image directly from Talend Studio.

Using Talend Studio, we can build a Docker image stored in a local Docker repository. Otherwise, we can build and publish a Docker image to any registry of our choice.

Let us look at both options:

Build the Docker Image from Talend Studio

  1. Right-click on the Job and navigate to the Build Job option:
  2. Under build type, select Docker Image:

3. Choose the appropriate context and log4h level.

4. Under Docker Options, select local if Docker and Studio are installed on same host, or select Remote if your Docker service is running on a different host from the one where Talend Studio is installed. In our example, we enabled Docker for a remote build via TCP on port 2375

tcp://dockerhostIP:2375

5. Once this is done, your Docker image is built and stored in the Docker repository, in our example on host 2.

6. Log in to the Docker host, in our example host 2, and execute the command docker images. You should be able to view the image we just built:

Build and Publish the Docker Image to the Registry from Talend Studio

Talend Studio can be used to build a Docker image, and the image can be published to any registry where the images can be picked up by Kubernetes or any container services. In our example, I have set up an AWS ECR registry.

  1. Right-click on the Job name and navigate to the Publish option.

Quick-Start-Guide-Talend-and-Docker-publish.png
Quick-Start-Guide-Talend-and-Docker-publish.png

2. Select the Export Type Docker Image:

3. Under Docker Options, provide the Docker host and port details as discussed in the previous topics. Give the necessary details of the registry and Docker image name:

Image Name = Repository Name
Image Tag=Jobname_Version
Username = AccessKeyId (AWS)
Password=Secret (AWS)

4. Once this is done, navigate to AWS ECR and you should able to search and find the image

Running Docker Images vs Shell or Bat scripts

With Talend, we are all accustomed to either .SH or .Bat scripts, so for better understanding of how to run Docker images let’s cover various aspects, like how to pass run time parameters and volume mounting, in detail below.

Passing Run Time Parameters to a Docker Image

To run the Docker image that is in your Docker repository (Talend Build Job as Docker)

  1. List all the Docker Images by running the command docker images:
  2. Now I want to run the image madhav_tmc/tlogrow, Tag latest, which uses a tWarn component to print a message. Part of the message will be from the context variable param.

3. Run the Docker image by passing a value to the context variable param at runtime:

docker run madhav_tmc/tlogrow:latest \--context_param param="Hello TalendDocker"

Below in the log, we can see the value passed to the Docker image at runtime