Skip to main content
Back to Articles

LangChain Output Parsers: Parse LLM Responses (Complete Guide)

Master LangChain output parsers for structuring LLM responses. Learn about StrOutputParser, PydanticOutputParser, and custom parsers.

March 1, 202616 min readBy Mathematicon

LangChain Output Parsers: Parse LLM Responses (Complete Guide)

When you ask an LLM a question, you get back raw text. Output parsers transform that raw text into structured, usable data like JSON, lists, or custom objects.

What is an Output Parser?

An output parser takes unstructured LLM output and converts it into structured data:

LLM Output (raw text):
"The weather is sunny with a temperature of 72°F and 30% humidity"
         ↓
   [Output Parser]
         ↓
Structured Data:
{
  "weather": "sunny",
  "temperature": 72,
  "humidity": 30
}

Why You Need Output Parsers

Without Output Parsers

response = llm.predict("List 3 fruits")
# Returns: "Here are three fruits:\n1. Apple\n2. Banana\n3. Orange"

# Parsing manually (error-prone)
lines = response.split('\n')
fruits = [line.split('. ')[1] for line in lines if '. ' in line]
# Fragile - breaks if LLM changes format slightly

With Output Parsers

from langchain.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()
prompt = PromptTemplate(
    template="List 3 fruits: {format_instructions}",
    input_variables=[],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

response = llm.predict(prompt.format())
fruits = parser.parse(response)
# Returns: ["Apple", "Banana", "Orange"]
# Robust - works consistently

Built-in Output Parsers

1. CommaSeparatedListOutputParser

from langchain.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()

# Get format instructions to include in prompt
instructions = parser.get_format_instructions()
print(instructions)
# "Your response should be a list of comma separated values, eg: 'foo, bar, baz'"

# Parse LLM response
response = "Apple, Banana, Orange"
result = parser.parse(response)
print(result)  # ['Apple', 'Banana', 'Orange']

2. StructuredOutputParser

Parse LLMs into structured objects with defined fields:

from langchain.output_parsers import StructuredOutputParser, ResponseSchema

response_schemas = [
    ResponseSchema(name="weather", description="The weather condition"),
    ResponseSchema(name="temperature", description="Temperature in Fahrenheit"),
    ResponseSchema(name="humidity", description="Humidity percentage"),
]

parser = StructuredOutputParser.from_response_schemas(response_schemas)

# Get instructions to put in prompt
instructions = parser.get_format_instructions()
prompt = PromptTemplate(
    template="Describe the weather: {format_instructions}",
    input_variables=[],
    partial_variables={"format_instructions": instructions}
)

response = llm.predict(prompt.format())
# LLM returns:
# ```json
# {
#   "weather": "sunny",
#   "temperature": 72,
#   "humidity": 30
# }
# ```

result = parser.parse(response)
print(result)
# {'weather': 'sunny', 'temperature': 72, 'humidity': 30}

3. PydanticOutputParser

Use Python type hints for strict validation:

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class PersonInfo(BaseModel):
    name: str = Field(description="Person's full name")
    age: int = Field(description="Person's age")
    occupation: str = Field(description="What they do for work")

parser = PydanticOutputParser(pydantic_object=PersonInfo)

instructions = parser.get_format_instructions()
prompt = PromptTemplate(
    template="Extract person info from: '{text}'\n{format_instructions}",
    input_variables=["text"],
    partial_variables={"format_instructions": instructions}
)

text = "John Smith is a 35-year-old software engineer"
response = llm.predict(prompt.format(text=text))

person = parser.parse(response)
print(person.name)          # "John Smith"
print(person.age)           # 35
print(person.occupation)    # "software engineer"

4. JSONOutputParser

from langchain.output_parsers import JsonOutputParser

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "number"},
        "skills": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
}

parser = JsonOutputParser(pydantic_object=schema)

response = '''
{
  "name": "Alice",
  "age": 30,
  "skills": ["Python", "JavaScript", "React"]
}
'''

result = parser.parse(response)
print(result)
# {'name': 'Alice', 'age': 30, 'skills': ['Python', 'JavaScript', 'React']}

5. BooleanOutputParser

from langchain.output_parsers import BooleanOutputParser

parser = BooleanOutputParser()

response = "Yes, the statement is correct."
result = parser.parse(response)
print(result)  # True

response = "No, I disagree."
result = parser.parse(response)
print(result)  # False

Custom Output Parsers

Create your own for domain-specific needs:

from langchain.output_parsers import BaseOutputParser

class SentimentParser(BaseOutputParser):
    def parse(self, text: str) -> dict:
        """Parse sentiment and confidence score"""
        # Simple example - in production, use ML model
        text_lower = text.lower()

        if 'positive' in text_lower or 'great' in text_lower:
            sentiment = 'positive'
        elif 'negative' in text_lower or 'bad' in text_lower:
            sentiment = 'negative'
        else:
            sentiment = 'neutral'

        # Extract confidence (0-100)
        import re
        match = re.search(r'(\d+)%?', text)
        confidence = int(match.group(1)) / 100 if match else 0.5

        return {
            'sentiment': sentiment,
            'confidence': confidence,
            'raw_text': text
        }

parser = SentimentParser()
result = parser.parse("This is great! 85% confidence")
print(result)
# {'sentiment': 'positive', 'confidence': 0.85, 'raw_text': '...'}

Chaining Parsers

Use multiple parsers for complex transformations:

from langchain.output_parsers import CommaSeparatedListOutputParser, StructuredOutputParser

# First parse as list, then as structured data
list_parser = CommaSeparatedListOutputParser()
response = llm.predict("List fruits, vegetables, and proteins")
items = list_parser.parse(response)
# ["apple", "broccoli", "chicken"]

# Then process each item
results = []
for item in items:
    detailed_response = llm.predict(f"Describe {item} nutritional value")
    results.append(detailed_response)

Real-World Examples

Example 1: Extract Key Information from Text

from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

response_schemas = [
    ResponseSchema(name="title", description="Title of the article"),
    ResponseSchema(name="author", description="Author name"),
    ResponseSchema(name="date", description="Publication date"),
    ResponseSchema(name="summary", description="2-3 sentence summary"),
]

parser = StructuredOutputParser.from_response_schemas(response_schemas)

template = """Extract key information from the following text:

Text: {text}

{format_instructions}"""

prompt = PromptTemplate(
    input_variables=["text"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
    template=template
)

article_text = """
Cloud Computing: The Future of Technology
By Sarah Johnson
Published March 15, 2024

Cloud computing is revolutionizing how businesses operate...
"""

llm = OpenAI()
output = llm.predict(prompt.format(text=article_text))
extracted = parser.parse(output)

print(extracted)
# {'title': 'Cloud Computing: The Future of Technology',
#  'author': 'Sarah Johnson',
#  'date': 'March 15, 2024',
#  'summary': '...'}

Example 2: Validate API Response Format

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, validator

class APIResponse(BaseModel):
    status: str
    data: dict
    error: str = None

    @validator('status')
    def status_valid(cls, v):
        if v not in ['success', 'error']:
            raise ValueError('status must be success or error')
        return v

parser = PydanticOutputParser(pydantic_object=APIResponse)

# Validate LLM-generated API response
response_text = '''
{
  "status": "success",
  "data": {"user_id": 123},
  "error": null
}
'''

response = parser.parse(response_text)
# Pydantic validates automatically
print(response.status)  # 'success'
print(response.data)    # {'user_id': 123}

Example 3: Multi-Step Parsing

# Step 1: Get raw response
raw = llm.predict("Generate 3 movie recommendations with ratings")
# "1. Avatar (8.5/10), 2. Inception (9/10), 3. Oppenheimer (8/10)"

# Step 2: Parse list
list_parser = CommaSeparatedListOutputParser()
# This needs restructuring first

# Step 3: Extract structured data
import re
movies = []
for line in raw.split('\n'):
    match = re.search(r'(.+?)\s+\((\d+\.?\d*)/10\)', line)
    if match:
        movies.append({
            'title': match.group(1),
            'rating': float(match.group(2))
        })

print(movies)
# [{'title': 'Avatar', 'rating': 8.5}, ...]

Error Handling

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, ValidationError

class StrictData(BaseModel):
    id: int
    name: str

parser = PydanticOutputParser(pydantic_object=StrictData)

# Invalid response
invalid_response = '{"id": "not_a_number", "name": "Alice"}'

try:
    result = parser.parse(invalid_response)
except ValidationError as e:
    print(f"Parsing failed: {e}")
    # Handle error gracefully
    # Retry with better prompt instructions

Best Practices

  1. Always provide format instructions - Include parser instructions in your prompt
  2. Be specific - Tell the LLM exactly what format you expect
  3. Handle errors - Wrap parsing in try-except blocks
  4. Validate output - Use Pydantic for strict validation
  5. Test thoroughly - LLMs may generate unexpected formats
  6. Chain parsers - Combine simple parsers for complex outputs
  7. Log results - Track parsing failures for debugging

Comparing Output Parsers

Parser Use Case Strictness
CommaSeparatedList Simple lists Low
StructuredOutput Fixed fields Medium
Pydantic Type-validated objects High
JSON Custom JSON schema High
Custom Domain-specific formats Any

Performance Considerations

# Avoid re-parsing same format repeatedly
# Cache the parser
parser = PydanticOutputParser(pydantic_object=MyClass)

# Reuse for many LLM calls
for item in items:
    response = llm.predict(f"Process {item}")
    parsed = parser.parse(response)  # Fast reuse

Conclusion

Output parsers are essential for production LLM applications:

  • Structured data - Convert text to usable formats
  • Validation - Ensure LLM responses match expectations
  • Error handling - Gracefully handle parsing failures
  • Chaining - Combine parsers for complex workflows

Use the right parser for your use case and always validate LLM output.


Learn More

Share this article

Related Articles