RevComm Inc. harBest Boosts MiiTel’s AI Speech Recognition
[Interview with RevComm Head of Development, Mr. Hashimoto]
Highly quality training data is the foundation of highly accurate speech recognition AI. How is the data creation process carried out?
May I ask you about your company's business profile?
RevComm's mission is to "reinvent communication and create a society where people care about each other,". We provide an IP phone service called "MiiTel," which is equipped with voice analysis AI. MiiTel is software, not a device, so all you need is a computer to make a call. MiiTel has been installed in many departments such as inside sales and call centers where companies need to make phone calls, because it eliminates the need for a phone and reduces fixed costs.
What sets this system apart is its ability to store all calls as voice data in the cloud. Leveraging AI technology, our system quantifies and visualizes various aspects of the speaker's performance, including speaking proficiency, speed, tone, keyword usage, and the integration of keywords within speech. This unique capability enhances our service offering.
Beyond telephone calls, our system extends its analysis to online meetings by integrating with platforms like Zoom, Google Meet, and Teams. This integration enables us to analyze meeting content, presentation styles, and other related contexts, providing comprehensive insights into virtual interactions.
The number of users has exceeded 50,000, and this year we were selected as one of "Forbes AI 50".
What kind of career path did you have, Mr. Hashimoto?
I currently serve as an executive officer and research director at RevComm, Inc. My academic journey began at Tokyo Institute of Technology in 1993, where I pursued both master's and doctoral degrees, completing my studies in 2002. In 2012, I transitioned into industrial relations, starting with GREE and later moving to LINE, before joining RevComm. My expertise lies in natural language processing (NLP), specifically focusing on applications like "ChatGPT" that enhance human-computer interaction through text-based communication. My background revolves around improving human interactions, particularly in digital contexts like texting.
Can you show us the actual screen of MiiTel?
Here is the screen displaying detailed call information: caller identity, timestamps, call frequency, and the specific content of each call. Additionally, the interface provides insights into the call dynamics, such as the percentage of speech versus silence (e.g., 87:13). This ratio represents the balance of conversation akin to ball possession in soccer, where 50:50 typically signifies good communication. When one party dominates the conversation excessively, reaching ratios like 87:13, it indicates potential communication issues.
In a specific scenario, consider a customer lodging a complaint, with subsequent communication centered around addressing this issue. Voice emotion recognition reveals predominantly negative sentiments due to the customer's complaint-focused dialogue and the sales representative's response-driven communication, resulting in a high percentage of the sales representative's speaking time during the call.
What is the negative reaction in terms of tone of voice or content?
Our voice emotion recognition engine uses both speech and language features. For example, when a person is actually angry, they speak louder or faster, and we use machine learning to capture these characteristics. It also captures text information at the same time, so it can learn negative words and estimate emotions. Since this is a completely deep learning, it is difficult to explain clearly how these factors are combined...
Did the service become popular due to the COVID-19 pandemic?
Certainly! Our service operates as a telephone service without requiring traditional landlines. Our telemarketers operate from our office using multiple telephones for making calls. However, due to the pandemic, remote work became essential.
Our service enables employees to work from home using their computers instead of traditional office phones. By utilizing this setup, employees can perform their tasks remotely without the need for dedicated phone devices. Our service had substantial growth during the pandemic!
What was the nature of the project for which you used harBest this time?
We utilized AI to convert voice data into text, transcribing spoken words into written text. To achieve this, the AI required prior exposure to the voice data for learning purposes, associating the text data with the corresponding voice recordings. To accomplish this task, we asked harBest to create this type of data for us.
Can you tell us about why you chose harBest and the challenges you faced at the time?
The challenge we encountered was the need for a substantial amount of data to train AI effectively, particularly in converting voice data into text. Acquiring such a large volume of data posed significant difficulties. We were extremely grateful that harBest was able to efficiently generate the required data within a short timeframe, helping us through this challenge.
harBest's annotation management screen for voice data. Users can playback and modify the text.
Before using harBest, did you do this in-house?
Yes, we previously managed transcription tasks in-house using our own tools. However, this process was labor-intensive and challenging, requiring significant resources and manpower to oversee. We ultimately faced the burden of managing a large workforce dedicated to this task. We greatly appreciate the assistance harBest provides, which has streamlined and simplified our data annotation process.
Text data transcribed by users. It is formatted as CSV before the data is delivered.
How much difference is there between outsourcing to harBest and continuing in-house?
The difference between outsourcing to harBest and continuing in-house is substantial. Previously, when handling transcription internally, we could only transcribe about one-fifth of our monthly volume due to resource constraints. Since switching to harBest, our monthly delivery speed has increased approximately fivefold.
Thank you very much.
I tried harBest when I learned about the free distribution of 3,000 face images [previous harBest campaign]. Upon reviewing the quality of the data, I was impressed by its excellence. At that time, I was encountering the issue I mentioned earlier, where data creation was slow, and I realized that without sufficient data, performance would not improve. I was considering outsourcing as a solution to address this challenge.
I know you compared and contrasted with other companies, but what were the deciding factors in choosing harBest?
The quality was consistent, and the cost was also low, so we chose harBest based on these deciding factors.
Do you have any comments on the current use of our service, or is there anything you like about the service or would like to see improved?
The quality of the data delivered to us is great and consistent. There is nothing in particular that I would like to see improved. We do check the data after receiving it, but with APTO’s QC, we are fine with it as it is.
Thank you very much! Could you tell us about your company's vision for the future?
I think it would be good if we could catch the points of excitement at meetings. We hear the need to be able to predict whether or not a business meeting will go well or not. Ultimately, we would like to create a situation in which the content of the voice can be used as knowledge and data for making business decisions. Our current goal is to eventually be able to pick up insights that can be used for management.
RevComm has created a very valuable service, and we are very happy to be able to support it. Thank you very much for your valuable time today.
関連事例
-
Streamlining Development After Successful Outsourcing of High-Volume Annotation Work
The Ricoh Company, Ltd
- AI Development (Experienced)
- Annotation
- Data Management/Labeling
- IT/Internet
- Annotation
- Experienced
-
How Leading AI Vendors Handle Essential Training Data for Generative AI
LightBlue
- AI Development (Experienced)
- Annotation
- Data Management/Labeling
- IT/Internet
- Annotation
- Data collection
- Data Management
- Experienced
-
Micro Control Systems: AI “Visualizing” Factories to Enhance Manufacturing
Micro Control Systems
- AI Development (Experienced)
- Annotation
- Data Management/Labeling
- IT/Internet
- Annotation
- Data collection
- Experienced
-
Challenges of Developing “LHTM-2”, a Large-Scale Language Processing Model From Japan.
alt, Inc.
- AI Development (Experienced)
- Annotation
- Data Management/Labeling
- IT/Internet
- Annotation
- Data collection
- Experienced
-
AGRIST Interview: Developing Agriculture Technology to Support the Aging Farmers of Japan
AGRIST
- AI Development (Experienced)
- Annotation
- Data Management/Labeling
- Agriculture
- Annotation
- Data collection
- Data Management
- Experienced