Ploomber AI Editor | multimodal-ai-assistant-6e93

App Description

Multimodal AI Assistant with Memory and Speech Multimodal Input Support: Accepts images, video, text, and audio inputs, allowing for versatile applications such as visual question answering, speech recognition, and more. Real-Time Speech Interaction: Supports bilingual real-time speech conversations with configurable voices, including features like emotion, speed, and style control, as well as end-to-end voice cloning and role play. GitHub

To upload files, please first save the app

Code Editor for app.py

import streamlit as st
import numpy as np
import cv2
from pydub import AudioSegment
import speech_recognition as sr

st.title('Multimodal AI Assistant')

# Image upload
uploaded_image = st.file_uploader('Upload an Image', type=['jpg', 'jpeg', 'png'])
if uploaded_image is not None:
    image_data = np.array(cv2.imdecode(np.frombuffer(uploaded_image.read(), np.uint8), cv2.IMREAD_COLOR))
    st.image(image_data, caption='Uploaded Image', channels='BGR')

# Video upload
uploaded_video = st.file_uploader('Upload a Video', type=['mp4', 'mov', 'avi'])
if uploaded_video is not None:
    st.video(uploaded_video)

# Text input
user_input_text = st.text_input('Enter text input:')
if user_input_text:
    st.write('You entered:', user_input_text)

# Audio input
audio_input = st.experimental_audio_input('Record a voice message')
if audio_input:
    st.audio(audio_input)

# Speech Recognition
recognizer = sr.Recognizer()
if audio_input:
    audio_file = sr.AudioFile(audio_input)
    with audio_file as source:
        audio_data = recognizer.record(source)
    try:
        text = recognizer.recognize_google(audio_data)
        st.write('Recognized Speech:', text)
    except sr.RequestError:
        st.error('API unavailable')
    except sr.UnknownValueError:
        st.error('Could not understand audio')

Loading code editor...

Click Save & Run to preview your app

Terminal

Hi! I can help you with any questions about Streamlit and Python. What would you like to know?

app.py