LangChain Output Parsers: Parse LLM Responses (Complete Guide)
When you ask an LLM a question, you get back raw text. Output parsers transform that raw text into structured, usable data like JSON, lists, or custom objects.
What is an Output Parser?
An output parser takes unstructured LLM output and converts it into structured data:
LLM Output (raw text):
"The weather is sunny with a temperature of 72°F and 30% humidity"
↓
[Output Parser]
↓
Structured Data:
{
"weather": "sunny",
"temperature": 72,
"humidity": 30
}
Why You Need Output Parsers
Without Output Parsers
response = llm.predict("List 3 fruits")
# Returns: "Here are three fruits:\n1. Apple\n2. Banana\n3. Orange"
# Parsing manually (error-prone)
lines = response.split('\n')
fruits = [line.split('. ')[1] for line in lines if '. ' in line]
# Fragile - breaks if LLM changes format slightly
With Output Parsers
from langchain.output_parsers import CommaSeparatedListOutputParser
parser = CommaSeparatedListOutputParser()
prompt = PromptTemplate(
template="List 3 fruits: {format_instructions}",
input_variables=[],
partial_variables={"format_instructions": parser.get_format_instructions()}
)
response = llm.predict(prompt.format())
fruits = parser.parse(response)
# Returns: ["Apple", "Banana", "Orange"]
# Robust - works consistently
Built-in Output Parsers
1. CommaSeparatedListOutputParser
from langchain.output_parsers import CommaSeparatedListOutputParser
parser = CommaSeparatedListOutputParser()
# Get format instructions to include in prompt
instructions = parser.get_format_instructions()
print(instructions)
# "Your response should be a list of comma separated values, eg: 'foo, bar, baz'"
# Parse LLM response
response = "Apple, Banana, Orange"
result = parser.parse(response)
print(result) # ['Apple', 'Banana', 'Orange']
2. StructuredOutputParser
Parse LLMs into structured objects with defined fields:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
response_schemas = [
ResponseSchema(name="weather", description="The weather condition"),
ResponseSchema(name="temperature", description="Temperature in Fahrenheit"),
ResponseSchema(name="humidity", description="Humidity percentage"),
]
parser = StructuredOutputParser.from_response_schemas(response_schemas)
# Get instructions to put in prompt
instructions = parser.get_format_instructions()
prompt = PromptTemplate(
template="Describe the weather: {format_instructions}",
input_variables=[],
partial_variables={"format_instructions": instructions}
)
response = llm.predict(prompt.format())
# LLM returns:
# ```json
# {
# "weather": "sunny",
# "temperature": 72,
# "humidity": 30
# }
# ```
result = parser.parse(response)
print(result)
# {'weather': 'sunny', 'temperature': 72, 'humidity': 30}
3. PydanticOutputParser
Use Python type hints for strict validation:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
class PersonInfo(BaseModel):
name: str = Field(description="Person's full name")
age: int = Field(description="Person's age")
occupation: str = Field(description="What they do for work")
parser = PydanticOutputParser(pydantic_object=PersonInfo)
instructions = parser.get_format_instructions()
prompt = PromptTemplate(
template="Extract person info from: '{text}'\n{format_instructions}",
input_variables=["text"],
partial_variables={"format_instructions": instructions}
)
text = "John Smith is a 35-year-old software engineer"
response = llm.predict(prompt.format(text=text))
person = parser.parse(response)
print(person.name) # "John Smith"
print(person.age) # 35
print(person.occupation) # "software engineer"
4. JSONOutputParser
from langchain.output_parsers import JsonOutputParser
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"skills": {
"type": "array",
"items": {"type": "string"}
}
}
}
parser = JsonOutputParser(pydantic_object=schema)
response = '''
{
"name": "Alice",
"age": 30,
"skills": ["Python", "JavaScript", "React"]
}
'''
result = parser.parse(response)
print(result)
# {'name': 'Alice', 'age': 30, 'skills': ['Python', 'JavaScript', 'React']}
5. BooleanOutputParser
from langchain.output_parsers import BooleanOutputParser
parser = BooleanOutputParser()
response = "Yes, the statement is correct."
result = parser.parse(response)
print(result) # True
response = "No, I disagree."
result = parser.parse(response)
print(result) # False
Custom Output Parsers
Create your own for domain-specific needs:
from langchain.output_parsers import BaseOutputParser
class SentimentParser(BaseOutputParser):
def parse(self, text: str) -> dict:
"""Parse sentiment and confidence score"""
# Simple example - in production, use ML model
text_lower = text.lower()
if 'positive' in text_lower or 'great' in text_lower:
sentiment = 'positive'
elif 'negative' in text_lower or 'bad' in text_lower:
sentiment = 'negative'
else:
sentiment = 'neutral'
# Extract confidence (0-100)
import re
match = re.search(r'(\d+)%?', text)
confidence = int(match.group(1)) / 100 if match else 0.5
return {
'sentiment': sentiment,
'confidence': confidence,
'raw_text': text
}
parser = SentimentParser()
result = parser.parse("This is great! 85% confidence")
print(result)
# {'sentiment': 'positive', 'confidence': 0.85, 'raw_text': '...'}
Chaining Parsers
Use multiple parsers for complex transformations:
from langchain.output_parsers import CommaSeparatedListOutputParser, StructuredOutputParser
# First parse as list, then as structured data
list_parser = CommaSeparatedListOutputParser()
response = llm.predict("List fruits, vegetables, and proteins")
items = list_parser.parse(response)
# ["apple", "broccoli", "chicken"]
# Then process each item
results = []
for item in items:
detailed_response = llm.predict(f"Describe {item} nutritional value")
results.append(detailed_response)
Real-World Examples
Example 1: Extract Key Information from Text
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
response_schemas = [
ResponseSchema(name="title", description="Title of the article"),
ResponseSchema(name="author", description="Author name"),
ResponseSchema(name="date", description="Publication date"),
ResponseSchema(name="summary", description="2-3 sentence summary"),
]
parser = StructuredOutputParser.from_response_schemas(response_schemas)
template = """Extract key information from the following text:
Text: {text}
{format_instructions}"""
prompt = PromptTemplate(
input_variables=["text"],
partial_variables={"format_instructions": parser.get_format_instructions()},
template=template
)
article_text = """
Cloud Computing: The Future of Technology
By Sarah Johnson
Published March 15, 2024
Cloud computing is revolutionizing how businesses operate...
"""
llm = OpenAI()
output = llm.predict(prompt.format(text=article_text))
extracted = parser.parse(output)
print(extracted)
# {'title': 'Cloud Computing: The Future of Technology',
# 'author': 'Sarah Johnson',
# 'date': 'March 15, 2024',
# 'summary': '...'}
Example 2: Validate API Response Format
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, validator
class APIResponse(BaseModel):
status: str
data: dict
error: str = None
@validator('status')
def status_valid(cls, v):
if v not in ['success', 'error']:
raise ValueError('status must be success or error')
return v
parser = PydanticOutputParser(pydantic_object=APIResponse)
# Validate LLM-generated API response
response_text = '''
{
"status": "success",
"data": {"user_id": 123},
"error": null
}
'''
response = parser.parse(response_text)
# Pydantic validates automatically
print(response.status) # 'success'
print(response.data) # {'user_id': 123}
Example 3: Multi-Step Parsing
# Step 1: Get raw response
raw = llm.predict("Generate 3 movie recommendations with ratings")
# "1. Avatar (8.5/10), 2. Inception (9/10), 3. Oppenheimer (8/10)"
# Step 2: Parse list
list_parser = CommaSeparatedListOutputParser()
# This needs restructuring first
# Step 3: Extract structured data
import re
movies = []
for line in raw.split('\n'):
match = re.search(r'(.+?)\s+\((\d+\.?\d*)/10\)', line)
if match:
movies.append({
'title': match.group(1),
'rating': float(match.group(2))
})
print(movies)
# [{'title': 'Avatar', 'rating': 8.5}, ...]
Error Handling
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, ValidationError
class StrictData(BaseModel):
id: int
name: str
parser = PydanticOutputParser(pydantic_object=StrictData)
# Invalid response
invalid_response = '{"id": "not_a_number", "name": "Alice"}'
try:
result = parser.parse(invalid_response)
except ValidationError as e:
print(f"Parsing failed: {e}")
# Handle error gracefully
# Retry with better prompt instructions
Best Practices
- Always provide format instructions - Include parser instructions in your prompt
- Be specific - Tell the LLM exactly what format you expect
- Handle errors - Wrap parsing in try-except blocks
- Validate output - Use Pydantic for strict validation
- Test thoroughly - LLMs may generate unexpected formats
- Chain parsers - Combine simple parsers for complex outputs
- Log results - Track parsing failures for debugging
Comparing Output Parsers
| Parser | Use Case | Strictness |
|---|---|---|
| CommaSeparatedList | Simple lists | Low |
| StructuredOutput | Fixed fields | Medium |
| Pydantic | Type-validated objects | High |
| JSON | Custom JSON schema | High |
| Custom | Domain-specific formats | Any |
Performance Considerations
# Avoid re-parsing same format repeatedly
# Cache the parser
parser = PydanticOutputParser(pydantic_object=MyClass)
# Reuse for many LLM calls
for item in items:
response = llm.predict(f"Process {item}")
parsed = parser.parse(response) # Fast reuse
Conclusion
Output parsers are essential for production LLM applications:
- Structured data - Convert text to usable formats
- Validation - Ensure LLM responses match expectations
- Error handling - Gracefully handle parsing failures
- Chaining - Combine parsers for complex workflows
Use the right parser for your use case and always validate LLM output.